Welcome to the BEHAV3D Tumor Profiler Demo. Here you can find a demo
example of how the Behaved Tumor Profiler pipeline works and what each
of the outputs can provide.
Tu run the pipeline for yourself, please refer to the Github
page, where you can find a wider explanation of the pipeline, as
well as the Google
Colab and R script versions.
Firstly all the necessary libraries must be installed
library(dplyr)
library(dtwclust)
library(e1071)
library(emmeans)
library(ggplot2)
library(gtools)
library(parallel)
library(pheatmap)
library(plotly)
library(plyr)
library(RANN)
library(readr)
library(reshape2)
library(scales)
library(sp)
library(spatstat)
library(stats)
library(tidyr)
library(umap)
library(viridis)
library(zoo)
We set a seed for reproducibility:
set.seed(123)
Input data can be obtained from image analysis software like Imaris (Oxford Instruments), where you extract the relevant features for each cell types. Each cell type must be inputted separately into the pipeline.
Input data must follow the following format:
Mouse_CellType_timelapse_LargeScaleRegion_Feature.csv
| Expected Cell types | |||
|---|---|---|---|
| Tumor Cells | SR101 | MG (Microglia) | BV (Blood Vessel) |
Example:
2430F13_SR101_timelapse_CL3_2_Distance.csv
Mouse information includes: 2430 -> Mother |
F -> Sex | 13 -> Day
LargeScaleRegion information includes: CL2 -> Class
(Environmental Cluster) | 2430F13.CL2_2 or just
CL2_2 -> Position
These are the features to be uploaded:
DisplacementDistance_tumorDisplacement_Delta_LengthDisplacement_LengthPositionSpeedTime
The data is outputed by softwares like Imaris in a .csv
format.
For example, a Tumor cells 2430F11 CL1_1 displacement file would look like this (Each row represents a single cell track):
And a SR101 2430F16 CL1_1 Position file would look like this:
There are two ways to start or continue your analysis in BEHAV3D Tumor Profiler:
We need to import the dataset and classifying the information into
distinct dataframes
Create a reading function, that reads input from Imaris software:
## function to import the stats of the tracked tumor cells data from csv files extracted by Imaris
read_plus <- function(flnm) {
read_csv(flnm, skip = 3, col_types = cols()) %>%
mutate(filename = flnm)
}
Now we can import the datafiles of interest. We are basically extracting
the information from .csv files and creating the
corresponding dataframes
## set directory where csv files are located
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/Tracked_tumor_cells"
setwd(working_directory)
# import volume per organoid object
pat = "*Displacement"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
files<-files[seq(1, length(files), 3)]
disp_csv <- ldply(files, read_plus)
# import distance to tumor
pat = "*Distance_tumor"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dist_tumor_csv <- ldply(files, read_plus)
# import area per organoid object
pat = "*Displacement_Delta_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_delta_csv <- ldply(files, read_plus)
# import area per organoid object
pat = "*Displacement_Length"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
dipl_length_csv <- ldply(files, read_plus)
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import speed
pat = "*Speed"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
speed_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each ID unique (IDs imported from different imaging files can be repeated):
category <- as.factor(dist_tumor_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dist_tumor_csv <- left_join(dist_tumor_csv, ranks)
dist_tumor_csv$ID2 <- with(dist_tumor_csv, interaction(ID, ranks))
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
category <- as.factor(speed_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
speed_csv <- left_join(speed_csv, ranks)
speed_csv$ID2 <- with(speed_csv, interaction(ID, ranks))
category <- as.factor(dipl_length_csv$filename) ##this measures how much distnace did a cell move over, a cell can mover over a big distance and stay at the same position
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_length_csv <- left_join(dipl_length_csv, ranks)
dipl_length_csv$ID2 <- with(dipl_length_csv, interaction(ID, ranks))
category <- as.factor(dipl_delta_csv$filename) ## this variable measures how much distance did the cell move from its origin
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
dipl_delta_csv <- left_join(dipl_delta_csv, ranks)
dipl_delta_csv$ID2 <- with(dipl_delta_csv, interaction(ID, ranks))
category <- as.factor(disp_csv$filename) ## a bit similar to the previous ones, this measures the area that a cell covers overtime
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
disp_csv <- left_join(disp_csv, ranks)
disp_csv$ID2 <- with(disp_csv, interaction(ID, ranks))
Firstly, we create the dataframe containing the statistics of
interest
detach(package:plyr)
##create dataframe containing the statistics of interest
master_1 <- left_join(dist_tumor_csv[,c(1,13)],time_csv[,c(1,5,6,8,10,11)],
by=c("ID2" = "ID2"))
master_1 <- left_join(master_1,disp_csv[,c(1,11)],
by=c("ID2" = "ID2"))
master_1 <- left_join(master_1,dipl_delta_csv[,c(1,11)],
by=c("ID2" = "ID2"))
master_1 <- left_join(master_1,dipl_length_csv[,c(1,11)],
by=c("ID2" = "ID2"))
master_1 <- left_join(master_1,speed_csv[,c(1,11)],
by=c("ID2" = "ID2"))
master <- left_join(master_1,pos_csv[,c(1,2,3,14)],
by=c("ID2" = "ID2"))
colnames(master) <- c("dist_tumor","ID2","Time","Tracked","TrackID", "filename","ranks","disp2","disp_d","disp_l","speed","x","y","z")
# Extract mouse data (ID) using regexp (mother+sex+day)
master$mouse <- master$filename
master$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master$mouse, perl=TRUE)
# Extract position metadata
master$position <- master$filename
master$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master$position, perl=TRUE)
master$position<-with(master, interaction(mouse, position))
# Extract class metadata
master$class <- master$position
master$class <- gsub(".*(CL\\d+)_.*", "\\1", master$class, perl=TRUE)
master$filename<-NULL
## Round the time
master$Time<-round(master$Time, digits=2)
##check that the imported tracks look ok
g<-ggplot(master)+geom_path(aes(x, y, group=TrackID), size=0.2,
arrow = arrow(ends = "last",type = "open",length = unit(0.01, "inches")))+
theme_bw()+theme(aspect.ratio = 1)+facet_wrap(class~position,scales = "free")
g
, the Blood Vessel information is extracted:
###Import BLOOD VESSELS stats (does not exist for all tracked positions so needs to be processed separtely.)
working_directory2 <- "D:/BEHAV3D_Tumor_Profiler/data/BV_stats"
setwd(working_directory2)
files <- list.files(path = working_directory2, full.names = T, recursive = TRUE)
BV_df <- ldply(files, read_plus)
###Process blood vessels data:
BV_df<-BV_df[c(1,6,7,8,9,11)]
colnames(BV_df) <- c("dist_BV","Time","Tracked","TrackID", "ID","filename")
## rename mouse accroding to filename:
BV_df$mouse<-BV_df$filename
BV_df$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", BV_df$mouse, perl=TRUE)
BV_df$position <- BV_df$filename
BV_df$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", BV_df$position, perl=TRUE)
BV_df$position <- with(BV_df, interaction(mouse, position))
BV_df$class <- BV_df$position
BV_df$class <- gsub(".*(CL\\d+)_.*", "\\1", BV_df$class, perl=TRUE)
## create ranks column based on master ranks:
ranks_master<-subset(master, position %in% BV_df$position)
ranks_master<-ranks_master[!duplicated(ranks_master$position),c(6,15)]
colnames(ranks_master)[1]<-"ranks_bv"
BV_df<- left_join(BV_df,ranks_master)
## keep only tracked cells and with a TrackID:
BV_df<-subset(BV_df, Tracked=="Tracked")
## remove any NA:
BV_df<-na.omit(BV_df)
## create unique Track_ID for Blood vessels dataframe:
BV_df$Track2 <- with(BV_df, interaction(ranks_bv, TrackID))
BV_df$Track2 <- gsub(".", '', BV_df$Track2, fixed = T)
BV_df$Track2 <- as.numeric(as.character(BV_df$Track2))
##sumarize per TrackID the average, min and max distance to blood vessels
ggplot(BV_df,aes(dist_BV))+geom_histogram()+facet_wrap(.~ position)
BV_df<-na.omit(BV_df)
BV_df$BV_contact<-ifelse(BV_df$dist_BV<4, 0, 1)
BV_df_sum<-BV_df%>%group_by(Track2)%>%summarise(BV_sd=sd(dist_BV)/sqrt(length(dist_BV)),BV_mean=mean(dist_BV), BV_min=min(dist_BV), BV_max=max(dist_BV), BV_contact=mean(BV_contact))
library(plyr)
##import Sr101 cells for the populations that have them
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/SR101_objects"
setwd(working_directory)
# import volume per organoid object
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each TrackID unique (TrackIDs imported from different imaging files can be repeated):
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
detach(package:plyr)
##create dataframe containing the statistics of interest
master_SR101 <- left_join(pos_csv[,c(1,2,3,10,12)],time_csv[,c(1,9)],
by=c("ID2" = "ID2"))
colnames(master_SR101) <- c("x","y","z","filename","ID2","Time")
## rename mouse accroding to filename:
# Extract mouse data (ID) using regexp (mother+sex+day)
master_SR101$mouse <- master_SR101$filename
master_SR101$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_SR101$mouse, perl=TRUE)
# Extract position metadata
master_SR101$position <- master_SR101$filename
master_SR101$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_SR101$position, perl=TRUE)
master_SR101$position<-with(master_SR101, interaction(mouse, position))
# Extract class metadata
master_SR101$class <- master_SR101$position
master_SR101$class <- gsub(".*(CL\\d+)_.*", "\\1", master_SR101$class, perl=TRUE)
master_SR101$filename<-NULL
## Convert imported time to hours:
master_SR101$Time<-round(master_SR101$Time, digits=2)
##import MG cells for the populations that have them
library(plyr)
working_directory <- "D:/BEHAV3D_Tumor_Profiler/data/MG_objects"
setwd(working_directory)
# import position per organoid object
pat = "*Position"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
pos_csv <- ldply(files, read_plus)
# import Time
pat = "*Time"
files <- list.files(path = working_directory, pattern = pat, full.names = T, recursive = TRUE)
time_csv <- ldply(files, read_plus)
### ## make each TrackID unique (TrackIDs imported from different imaging files can be repeated):
category <- as.factor(pos_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
pos_csv <- left_join(pos_csv, ranks)
pos_csv$ID2 <- with(pos_csv, interaction(ID, ranks))
category <- as.factor(time_csv$filename)
ranks <- rank(-table(category), ties.method="first")
ranks <- as.data.frame(ranks)
ranks$filename <- row.names(ranks)
time_csv <- left_join(time_csv, ranks)
time_csv$ID2 <- with(time_csv, interaction(ID, ranks))
detach(package:plyr)
##create dataframe containing the statistics of interest
master_MG <- left_join(pos_csv[,c(1,2,3,10,12)],time_csv[,c(1,9)],
by=c("ID2" = "ID2"))
colnames(master_MG) <- c("x","y","z","filename","ID2","Time")
## rename mouse accroding to filename:
# Extract mouse data (ID) using regexp (mother+sex+day)
master_MG$mouse<-master_MG$filename
master_MG$mouse <- gsub("^.*[_/](\\d*[MF]\\d+).*", "\\1", master_MG$mouse, perl=TRUE)
# Extract position metadata
master_MG$position <- master_MG$filename
master_MG$position <- gsub(".*_(CL\\d+_\\d+)_.*", "\\1", master_MG$position, perl=TRUE)
master_MG$position<-with(master_MG, interaction(mouse, position))
# Extract class metadata
master_MG$class <- master_MG$position
master_MG$class <- gsub(".*(CL\\d+)_.*", "\\1", master_MG$class, perl=TRUE)
master_MG$filename<-NULL
## Convert imported time to hours:
master_MG$Time<-round(master_MG$Time, digits=2)
Now we need to join the information from all of the sources into a
single dataframe
We first calulate the interactions between the nearest cells:
### test that tracks imported are fine:
master_test<-master%>%group_by(position)%>%mutate(x=x-min(x), y=y-min(y))
g<-ggplot(master)+geom_path(aes(x, y, group=TrackID, colour=dist_tumor), arrow = arrow(ends = "last", type = "closed",length = unit(0.005, "inches")))+theme_bw()
g<-g + theme(legend.position = "none")+facet_wrap(.~position, scales = "free")
g
## To calculate interactions between cells
List2=list()
for (m in unique(master$position)){
distance_M4<-master[which(master$position==m),]
List = list()
for (i in unique(distance_M4$Time)){
distancei <- distance_M4[distance_M4$Time==i,]
distancei2<-as.data.frame(distancei[,c(3,11,12,13)])
coordin<-ppx(distancei2, coord.type=c("t","s","s", "s"))
dist1<- nndist(coordin)
dist2<- nndist(coordin, k=2)
dist3<- nndist(coordin, k=3)
dist4<- nndist(coordin, k=4)
dist5<- nndist(coordin, k=5)
dist6<- nndist(coordin, k=6)
dist7<- nndist(coordin, k=7)
dist8<- nndist(coordin, k=8)
dist9<- nndist(coordin, k=9)
dist10<- nndist(coordin, k=10)
dist<-cbind(dist1,dist2,dist3, dist4,dist5, dist6, dist7, dist8, dist9, dist10) ### calculate distance to the three nearest neighbors
dist<-data.frame(rowMeans(dist[,c(1:3)]),rowMeans(dist))
distanceDFi_dist<-cbind(distancei, dist)
List[[length(List)+1]] <-distanceDFi_dist ## store to list object
}
master_dist <- do.call(rbind, List) ## convert List to dta
List2[[length(List2)+1]] <-master_dist
}
master_distance<-do.call(rbind, List2)
colnames(master_distance)[c(17,18)] <- c("dist_3_neigh", "dist_10_neigh")
###remove the untracked tracks:
master_distance<-subset(master_distance, Tracked=="Tracked")
## create a unique track_ID for master:
master_distance$Track2 <- with(master_distance, interaction(ranks, TrackID))
master_distance$Track2 <- gsub(".", '', master_distance$Track2, fixed = T)
master_distance$Track2 <- as.numeric(as.character(master_distance$Track2))
##check that the imported tracks look ok
g<-ggplot(master_distance)+geom_path(aes(x, y, group=Track2), size=0.2,
arrow = arrow(ends = "last",type = "open",length = unit(0.01, "inches")))+
theme_bw()+theme(aspect.ratio = 1)+facet_wrap(class~position,scales = "free")
g
##For SR101
master_distance_SR101<-subset(master_distance, mouse %in% unique(master_SR101$mouse))
### Calculate distance from Tumor cells to SR101.
### Considering that olig2 doens;t move much, let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of SR101 objects:
brightest_SR101<-master_SR101%>%group_by(Time, position)%>%summarise(n=n())%>%group_by( position)%>%filter(n==max(n))
master_SR101<-subset(master_SR101, Time %in%brightest_SR101$Time)
List2<-list()
for (m in unique(master_distance_SR101$position)){
master_pos <-master_distance_SR101%>%filter(position==m)%>%group_by(Time)%>%arrange(Time)
master_SR101_pos <-master_SR101%>%filter(position==m)%>%group_by(Time)%>%arrange(Time)
List1 = list() ## create list for ID of the contacting objects
for(t in unique(master_pos$Time)){
master_pos_t <-master_pos%>%filter(Time==t)
### calculate distance to 5 neigrest neighorbing SR101 in a radius of 40um
cells_radius<-nn2(data=master_SR101_pos[c("x","y","z")], query = master_pos_t[c("x","y","z")],k=30, treetype = 'bd',searchtype = "radius", radius = 30)
cells_min<-nn2(data=master_SR101_pos[c("x","y","z")], query = master_pos_t[c("x","y","z")],k=1, treetype = 'bd',searchtype = "standard")
dist_table = data.frame(master_pos_t[c("Time","Track2","mouse","position","class")],cells_min[["nn.dists"]],cells_radius[["nn.dists"]])
List1[[length(List1)+1]] <-dist_table
rm(temp)
}
cell_radius_t <- do.call(rbind, List1)
List2[[length(List2)+1]] <-cell_radius_t
rm(cell_radius_t)
}
master_dist_SR101 <- do.call(rbind, List2)
### convert the distant cells to 0
temp2<-master_dist_SR101[c("X1","X2","X3","X4","X5", "X6", "X7", "X8", "X9", "X10", "X11","X12","X13","X14","X15","X16","X17","X18","X19","X20", "X21","X22","X23","X24","X25","X26","X27","X28","X29","X30")]
temp2[temp2 >1.340781e+153]<-0
### Calculate number of contacts:
master_dist_SR101$n_SR101<-rowSums(temp2 != 0)
### renamen min_SR101:
colnames(master_dist_SR101)[colnames(master_dist_SR101) == "cells_min...nn.dists..."] <- "min_SR101"
### bind information on SR101 distnaces to the main dataframe:
master_dist_SR101[c("X1","X2","X3","X4","X5", "X6", "X7", "X8", "X9", "X10", "X11","X12","X13","X14","X15","X16","X17","X18","X19","X20", "X21","X22","X23","X24","X25","X26","X27","X28","X29","X30")]<-NULL
## For each Track2 find what is the minimal distance to SR101 and the max n of SR101 neiighbors in 30um diameter:
master_distance_SR101<-master_dist_SR101%>%group_by(Track2, mouse, position, class)%>%summarize(n_SR101=max(n_SR101), min_SR101 =min(min_SR101))
### plot per position types the values of
p <- ggplot(master_distance_SR101)+geom_jitter(aes(x=class, y=n_SR101))+
geom_bar(aes(x=class , y=n_SR101),alpha=0.5,stat = "summary") +ggtitle("number of neighboring SR101 cells per position type")+theme_classic()+
theme(aspect.ratio=0.7)
p
p <- ggplot(master_distance_SR101)+geom_jitter(aes(x=class, y=min_SR101))+
geom_boxplot(aes(x=class , y=min_SR101),alpha=0.5, outlier.colour = NA) +ggtitle("min distance to SR101 cells per position type")+theme_classic()+
theme(aspect.ratio=0.7)
p
##For MG
master_distance_MG<-subset(master_distance, mouse %in% unique(master_MG$mouse))
### Calculate distance from Tumor cells to MG.
### Considering that olig2 doens;t move much, let's project the position of the birghtest timepoint from each movie to all timepoints.
### Identify birghtest timepoint by the number of MG objects:
brightest_MG<-master_MG%>%group_by(Time, position)%>%summarise(n=n())%>%group_by( position)%>%filter(n==max(n))
master_MG<-subset(master_MG, Time %in%brightest_MG$Time)
List2<-list()
for (m in unique(master_distance_MG$position)){
master_pos <-master_distance_MG%>%filter(position==m)%>%group_by(Time)%>%arrange(Time)
master_MG_pos <-master_MG%>%filter(position==m)%>%group_by(Time)%>%arrange(Time)
List1 = list() ## create list for ID of the contacting objects
for(t in unique(master_pos$Time)){
master_pos_t <-master_pos%>%filter(Time==t)
### calculate distance to 5 neigrest neighorbing MG in a radius of 30um
cells_radius<-nn2(data=master_MG_pos[c("x","y","z")], query = master_pos_t[c("x","y","z")],k=30, treetype = 'bd',searchtype = "radius", radius = 30)
cells_min<-nn2(data=master_MG_pos[c("x","y","z")], query = master_pos_t[c("x","y","z")],k=1, treetype = 'bd',searchtype = "standard")
dist_table = data.frame(master_pos_t[c("Time","Track2","mouse","position","class")],cells_min[["nn.dists"]],cells_radius[["nn.dists"]])
List1[[length(List1)+1]] <-dist_table
rm(temp)
}
cell_radius_t <- do.call(rbind, List1)
List2[[length(List2)+1]] <-cell_radius_t
rm(cell_radius_t)
}
master_dist_MG <- do.call(rbind, List2)
### convert the distant cells to 0
temp2<-master_dist_MG[c("X1","X2","X3","X4","X5", "X6", "X7", "X8", "X9", "X10", "X11","X12","X13","X14","X15","X16","X17","X18","X19","X20", "X21","X22","X23","X24","X25","X26","X27","X28","X29","X30")]
temp2[temp2 >1.340781e+153]<-0
### Calculate number of contacts:
master_dist_MG$n_MG<-rowSums(temp2 != 0)
### renamen min_MG:
colnames(master_dist_MG)[colnames(master_dist_MG) == "cells_min...nn.dists..."] <- "min_MG"
### bind information on MG distnaces to the main dataframe:
master_dist_MG[c("X1","X2","X3","X4","X5", "X6", "X7", "X8", "X9", "X10", "X11","X12","X13","X14","X15","X16","X17","X18","X19","X20", "X21","X22","X23","X24","X25","X26","X27","X28","X29","X30")]<-NULL
## For each Track2 find what is the minimal distance to MG and the max n of MG neiighbors in 30um diameter:
master_distance_MG<-master_dist_MG%>%group_by(Track2, mouse, position, class)%>%summarize(n_MG=max(n_MG), min_MG =min(min_MG))
### plot per position types the values of
p <- ggplot(master_distance_MG)+geom_jitter(aes(x=class, y=n_MG))+
geom_bar(aes(x=class , y=n_MG),alpha=0.5,stat = "summary") +ggtitle("number of neighboring MG cells per position type")+theme_classic()+
theme(aspect.ratio=0.7)
p
p <- ggplot(master_distance_MG)+geom_jitter(aes(x=class, y=min_MG))+
geom_boxplot(aes(x=class , y=min_MG),alpha=0.5, outlier.colour = NA) +ggtitle("min distance to MG cells per position type")+theme_classic()+
theme(aspect.ratio=0.7)
p
Time of interest selection and substitution of NA values:
### create empyt dataframe with combination of Time of interest:
max(master_distance$Time)
## [1] 5.44
length(seq(from=0, to=5.6, by=0.2))
## [1] 29
empty_df = master_distance[1:29,] ### create a dataframe with the same size as the sequence of Time of interest:
empty_df[!is.na(empty_df)] <- NA ## make it empyt
empty_df$Time<-seq(from=0, to=5.6, by=0.2)
empty_df$Track2<-0.5
### now merge this fake dataframe with my dataframe of interest:
master_distance<-rbind(master_distance, empty_df)
### complete all the datasets so that they have the same timepoints:
master_distance<-master_distance%>%complete(Track2,Time)
### remove the fake track that is not necessary anymore:
master_distance<-subset(master_distance, !Track2==0.5)
##round the number, otherwise they are not equal
master_distance$Time<-round(master_distance$Time,2)
##if there is any track2 and time that was duplicated remove the NA
# Assuming your dataframe is named your_dataframe
master_distance <- master_distance %>%
group_by(Time, Track2) %>%
dplyr::slice(if (n() == 1) 1 else which(!is.na(dist_tumor)))
### create a for loop to refill all the NA with interpolated values:
### create a list of the columns names that need to be refilled
column_names<-names(master_distance)
column_names<-subset(column_names,!column_names %in%c("TrackID","Track2", "ranks","ID","ID2","Time","position","mouse","class","pos", "Tracked","speed"))
## create a first dataset with refilled values for speed:
time_series<-acast(master_distance, Time ~ Track2, value.var='speed',fun.aggregate = mean)
## rownames timepoints:
row.names(time_series)<-unique(master_distance$Time)
## get rid of NA
time_series_zoo<-zoo(time_series, row.names(time_series))
time_series_zoo<-na.approx(time_series_zoo) ## replace by interpolated value
time_series<-as.matrix(time_series_zoo)
time_series2<-melt(time_series)
data<-time_series2[complete.cases(time_series2), ]
colnames(data)<-c("Time", "Track2", "speed")
### ----------
for (i in column_names){
time_series<-acast(master_distance, Time ~ Track2, value.var=i,fun.aggregate = mean)
row.names(time_series)<-unique(master_distance$Time)
## get rid of NA
time_series_zoo<-zoo(time_series,row.names(time_series))
time_series_zoo<-na.approx(time_series_zoo) ## replace by last value
time_series<-as.matrix(time_series_zoo)
time_series2<-melt(time_series)
new<-time_series2[complete.cases(time_series2), ]
data[ , ncol(data) + 1] <- new[3] # Append new column
colnames(data)[ncol(data)] <- paste0(i)
}
### check that the data is correct by plotting the evolution of a certain variable overtime -----
p1 <-ggplot(data, aes(Time, disp2, group=Track2, color = as.factor(Track2))) +
geom_smooth(method = "loess",size = 0.5, se = F, alpha=0.3, span=1)+
theme_bw() +
theme(axis.text.x = element_text(size=10), axis.text.y = element_text(size=5), axis.title.y = element_text(size = 10), axis.title.x = element_text(size = 15), legend.text=element_text(size= 10))+
ggtitle("disp2")+theme(aspect.ratio=1,legend.position = "none")
p1
Then, metadata information is added to the analysis:
### add metadata info
master_cor <- data
master_metadata<- master_distance[c("position","mouse","class", "Track2")]
master_metadata<-na.omit(master_metadata)
master_metadata<-master_metadata[!duplicated(master_metadata$Track2),]
#detach(package:plyr, unload = TRUE)
master_cor<- left_join(master_cor ,master_metadata)
### Filter timepoints of interest with same time interval:
time_of_interest<-as.character(seq(from=0, to=5.6, by=0.2)) ### convert to character
master_cor$Time<-as.character(master_cor$Time)
master_cor<-subset(master_cor, Time%in%time_of_interest)
master_cor$Time<-as.numeric(master_cor$Time) ### now back to numeric
##check that the imported tracks look ok
g<-ggplot(master_cor)+geom_path(aes(x, y, group=Track2), size=0.2,
arrow = arrow(ends = "last",type = "open",length = unit(0.01, "inches")))+
theme_bw()+theme(aspect.ratio = 1)+facet_wrap(class~position,scales = "free")
g
Finally, the generated dataframes are exported to use as
Checkpoints when repeating the analysis:
Set working directory to export dataframes
setwd("D:/BEHAV3D_Tumor_Profiler/results/rds")
All imported information:
saveRDS(master_distance, "master_distance.rds")
Time of interest subset information:
saveRDS(master_cor,"master_cor.rds")
Microglia information:
saveRDS(master_distance_MG,"master_distance_MG.rds")
SR101 information:
saveRDS(master_distance_SR101,"master_distance_SR101.rds")
Blood Vessel information:
saveRDS(BV_df_sum,"BV_df_sum.rds")
.rds files obtained.If the RDS files are in the working directory, the BEHAV3D Tumor
Profiler script will skip the data input and initial pre-processing, by
using the provided .rds files.
setwd("D:/BEHAV3D_Tumor_Profiler/results")
master_cor<-readRDS("master_cor.rds")
master_distance<-readRDS("master_distance.rds")
master_distance_MG<-readRDS("master_distance_MG.rds")
master_distance_SR101<-readRDS("master_distance_SR101.rds")
BV_df_sum<-readRDS("BV_df_sum.rds")
print("data already imported")
## [1] "data already imported"
After the .rds checkpoints, we can continue with joint
data processing.
In this section, the joint dataframes are adjusted so all track have the same length
### sumarize the length of the tracks:
## create relative time
master_cor2<-master_cor %>%
group_by(Track2) %>%arrange(Time)%>%mutate(Time2 = Time - dplyr::first(Time))
ggplot(master_cor2,aes(Time2))+geom_histogram()+facet_wrap(.~ position)
This plot shows a summary of the imported tracks for each position and
time
max_time<-master_cor2%>%group_by(position)%>%summarise(max_Time=max(Time2))
min_time <- min(max_time$max_Time) #this is the minimal time that any position has
hist(max_time$max_Time)
Here we can see the track length distribution, so we can normalize all tracks to the minimal common time:
master_cor2<-master_cor2 %>%
group_by(Track2)%>%filter(Time2<min_time) ##Minimal time for a position is 2.6
## keep only the tracks that are of the same length (max number of timepoints (13 in this case), this is how they were defined)
# Calculate the most frequent n for each group
subtrack_length<-master_cor2%>%group_by(Track2, position)%>%
summarise(n=n_distinct(Time2), first_t=min(Time2), last_t=max(Time2))%>%
ungroup()
hist(subtrack_length$n)
This histogram shows the track length across all tracks, so the most
common track length can be selected
freq_n <- as.numeric(which.max(table(subtrack_length$n))) # Most frequent n
subtrack_length<-subtrack_length%>%filter(n==freq_n) # we need to automate this to get the most frequent n
#Now we only keep these subtracks that have the same length:
master_cor2<-subset(master_cor2, Track2%in%subtrack_length$Track2)
Plot the unique tracks for each position using the tumor distance as
color label:
### create a parameter for distance to tumor core
master_cor2<-master_cor2 %>%
group_by(position)%>%mutate(dist_tumor_1 = scales::rescale(dist_tumor, to=c(0,100)))
### normalize the distance to tumor factor per position position
master_cor2<-master_cor2 %>%
group_by(Track2) %>%arrange(Time2)%>%mutate(dist_tumor = dist_tumor - dplyr::first(dist_tumor))%>%ungroup()
#### check tracks
master_M4_test<-master_cor2%>%group_by(position, mouse)%>%mutate(x=x-min(x), y=y-min(y))
g<-ggplot(master_M4_test)+geom_path(aes(x, y, group=Track2, colour=dist_tumor), arrow = arrow(ends = "last", type = "closed",length = unit(0.005, "inches")))+theme_bw()
g<-g + theme(legend.position = "none")+scale_colour_gradient2(low = "blue", mid = "grey" , high = "red") +facet_wrap(.~position)
g
Number of unqiue tracks that are processed:
length(unique(master_cor2$Track2))
## [1] 981
Now we need to do some feature engineering, since we make a sliding
window analysis we need to requantify the values of displacement, speed,
etc from scratch
## scaling
library(parallel)
library(dplyr)
library(dtwclust)
library(stats)
library(scales)
#detach(package:spatstat, unload = TRUE)
###normalize the data:
master_norm<-master_cor2%>%ungroup()
column_names2<-names(master_cor2)
## select what data to normalize
column_names2<-subset(column_names2,!column_names2 %in%c("Time","Track2","x","y","z","position","mouse","class","Time2", "dist_tumor_1", "dist_3_neigh", "dist_10_neigh", "iteration", "Track2"))
master_norm<-as.data.frame(master_norm)
# transform skwed variables
# Calculate skewness for each variable
skew_values <- apply(master_norm[c(column_names2)], 2, skewness)
# Identify variables with skewness greater than a threshold (e.g., 0.5)
skewed_variables <- names(skew_values[abs(skew_values) > 2])
# Apply logarithmic transformation to skewed variables
master_norm[ , skewed_variables] <- log1p(master_norm[ , skewed_variables])
We performa a PCA dimensionality reduction to prepare for the
multivariate analysis
### perfrom PCA on the factors I want to use
master_scaled<-master_norm[c(column_names2)]%>%mutate(across(everything(), scale))%>%ungroup() ##first scale
##now to give more weight to the varibale of dist_tumor so the ability of cells to leave
#master_scaled[,c("dist_tumor")]<-master_scaled[,c("dist_tumor")]*2
master_pca<-prcomp(master_norm[c(column_names2)], scale = T)
master_pca2<-master_pca[["x"]] ##select only PC
##Semi-supervised learning, we introduce as well the distance to tumor variable that defines the direction of the cells, since it is the most important for us
## Make an option of running completely unsupervised (only include the pca) and semi-supervised learning here
master_pca3<-data.frame(master_norm[,c(2)],master_pca2[,c(1:3)]) ### take only first 3 components since they explain most variation
Choose between an unsupervised or a semi-supervised approach. The
semi-supervised approach introduces the distance to tumor variable in
the analysis
Analysis <- "Semi-supervised" # @param ["Semi-supervised", "Unsupervised"]
if (Analysis == "Semi-supervised"){
colnames(master_pca3)[1]<- "Track2"
print("Semi-supervised Analysis")
} else{
print("Unsupervised Analysis")
}
## [1] "Semi-supervised Analysis"
Plot the PCA information:
pcaCharts <- function(x) {
x.var <- x$sdev ^ 2
x.pvar <- x.var/sum(x.var)
print("proportions of variance:")
print(x.pvar)
par(mfrow=c(2,2))
plot(x.pvar,xlab="Principal component", ylab="Proportion of variance explained", ylim=c(0,1), type='b')
plot(cumsum(x.pvar),xlab="Principal component", ylab="Cumulative Proportion of variance explained", ylim=c(0,1), type='b')
screeplot(x, main="Absolute Variance", xlab="Principal Component")
screeplot(x,type="l", main="Absolute Variance")
par(mfrow=c(1,1))
}
pcaCharts(master_pca)
## [1] "proportions of variance:"
## [1] 0.5643365414 0.1929346874 0.1787666118 0.0631326010 0.0008295584
Once we have normalized, adjusted and prepared all the data for analysis, it is time to proceed to the analysis modules.
The BEHAV3D Tumor Profiler is divided in three distinct modules:
1. Heterogeneity Module - Implements multiparametric single-cell time-series classification, allowing us to identify distinct single-cell behavioral patterns
2. Large-scale phenotyping module - Performs large-scale TME phenotyping and identifies regions with a specific cellular composition and architecture within the TME of intravitally imaged tumors
3. Small-scale phenotyping module - Further refines TME phenotyping to better understand tumor cell behavior
Modules are collapsed to facilitate navigation. Please unscroll the modules you would like to check
We will firstly create an output directory for the module:
## Create the output directory
folder <- "D:/BEHAV3D_Tumor_Profiler/results/heterogeneity_module"
if (file.exists(folder)) {
cat("The folder already exists")
} else {
dir.create(folder)
}
## The folder already exists
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
###MULTIVARIATE
list_master_pca <- split(master_pca3[,-c(1)],master_pca3$Track2) ## split into list
if ( ! file.exists("D:/BEHAV3D_Tumor_Profiler/results/matrix_distmat.rds")){
##parallel working
# load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores()-2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
###MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat<-as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
}else{
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2<-as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience
# Adjust parameters according to the experiment characteristics
umap_dist<- umap(matrix_distmat,n_components=2,input="dist",init = "random",
n_neighbors=7,min_dist=0.5, spread=6,n_epochs=1000 , local_connectivity=1)
umap_1 <- as.data.frame(umap_dist$`layout`)
#scatter3Drgl(umap_1$V1, umap_1$V2, umap_1$V3)
Plot of the raw UMAP
plot(umap_dist$`layout`, # Plot the result
col=rgb(0,0,0,alpha=0.1),
pch=19,
asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
main="Raw UMAP")
Here, we determine the number of clusters for the analysis.
(n_cluster)
umap_1 <- as.data.frame(umap_dist$`layout`)
Track2_umap<-cbind(Track2,umap_1)
positiontype<-master_norm[,c("Track2", "position","class","mouse")]
positiontype<- positiontype[!duplicated(positiontype$Track2),]
umap_2 <- left_join(Track2_umap ,positiontype)
# Clustering
n_cluster=7
hc <- hclust(dist(umap_dist$`layout`, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1]<- "cluster2"
Here. we can obtain different UMAP representations, providing our color
palette
##plot
# Make a color palette that adjusts to the cluster direction
# Generate a continuous palette with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise","darkorchid4","darkorchid1", "deeppink4","deeppink1", "goldenrod3", "gold"))(50)
# Select 8 colors from the continuous palette
mypalette_1<- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(cluster2))) +
geom_point(size=2, alpha=0.8) + labs(color="cluster")+
xlab("") + ylab("")+
ggtitle("umap Cluster") +scale_color_manual(values=mypalette_1)+
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)
UMAP positions colored by large-scale regions
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(class))) +
geom_point(size=0.5, alpha=0.8) + labs(color="Environment_cluster")+
xlab("") + ylab("")+
ggtitle("umap Cluster") +scale_color_manual(values=c("cyan","gold","red"))+
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+facet_wrap(mouse~position)
UMAP colored by large-scale regions
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(class))) +
geom_point(size=2, alpha=0.6) + labs(color="Tumor region phenotype")+
xlab("") + ylab("")+
ggtitle("distribution of large scale regions")+scale_color_manual(values=c("cyan","gold","red"))+
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)
### UMAP direction
Here, we incorporate the MG and SR101 information to project the direction information into the UMAP
### calculate average stats per track
master_class <- left_join(master_cor2 ,umap_3[c(1:4)], by = c("Track2" = "Track2"))
master_class<-master_class %>%
filter(cluster2 != 0)
master_class$cluster2<-as.numeric(master_class$cluster2)
master_class_sum<-master_class%>%group_by(position,class,mouse, Track2)%>%
arrange(Time)%>%summarize(cluster2=mean(cluster2), V1=mean(V1), V2=mean(V2),
dist_tumor=mean(dist_tumor), dist_3_neigh=mean(dist_3_neigh),
dist_10_neigh=mean(dist_10_neigh), speed=mean(speed), disp2=mean(disp2),
disp_d=mean(disp_d),disp_l=mean(disp_l), movement=last(dist_tumor),
distance_to_tumor=last(dist_tumor_1),raw_dist_tumor=last(dist_tumor),
Time=first(Time))%>%ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum,master_distance_MG[c("Track2","n_MG","min_MG")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,master_distance_SR101[c("Track2","n_SR101","min_SR101")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,BV_df_sum ,by = c("Track2" = "Track2"))
mid <- 0
master_class_sum$movement2<-ifelse(master_class_sum$movement>0,"away from tumor", ifelse(master_class_sum$movement==0, "no movement", "towards the tumor"))
master_class_sum<-master_class_sum[!is.na(master_class_sum$cluster2),]
saveRDS(master_class_sum,"master_class_sum.rds")
write.csv(master_class_sum,"master_class_sum.csv")
And here is the resulting plot:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=movement)) +
geom_point(size=2, alpha=0.8) +labs(color="Direction")+
xlab("") + ylab("") +
ggtitle("Direction of movement relative to tumor core") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+theme(aspect.ratio=1)+
scale_color_gradient2(midpoint = mid, low = "blue",mid="grey80",
high = "red3")
In this section, we project the relevant clustering features in a Heatmap
###Create heatmap
sum_all<-master_class_sum%>%group_by(cluster2)%>%
summarise(displacement2 = median(disp2),
speed = median(speed),displacement_delta=median(disp_d),displacement_length=median(disp_l),max_speed=quantile(speed, 0.75), persistance=(displacement_length/displacement_delta),nearest_10_cell= median(dist_10_neigh),dist_tumor_core=median(distance_to_tumor),
n_SR101=mean(n_SR101, na.rm = T),
min_SR101=mean(min_SR101, na.rm = T),n_MG=mean(n_MG, na.rm = T),min_MG=mean(min_MG, na.rm = T),min_BV_dist=mean(BV_min, na.rm = T),sd_BV_dist=mean(BV_sd, na.rm = T),mean_BV_dist=mean(BV_mean, na.rm = T),
movement=median(movement) )
Cluster_movement<-sum_all[,c(1,length(names(sum_all)))]
Cluster_movement <-Cluster_movement[order(Cluster_movement$movement),]
stdize = function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df<-sum_all[,-c(1)]
df<-df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE]%>%ungroup()
df<-as.data.frame(df)
rownames(df)<-Cluster_movement$cluster2
We show all the features extracted in the analysis:
heat_m<-pheatmap(df, clustering_method = "complete", cluster_cols=F,cluster_rows=F,cellwidth=20, cellheight=20,
treeheight_row = 0, treeheight_col = 0,fontsize = 8,na_col = "grey50", main="Features",
angle_col = 90,color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
df_cl_features<-df[,c("displacement2", "speed", "max_speed", "displacement_delta" , "displacement_length", "persistance", "movement")]
rownames(df_cl_features)<-rownames(df)
And a heatmap of the features used for clustering:
pheatmap(df_cl_features, clustering_method = "complete", cluster_cols=F,cluster_rows=F,cellwidth=20, cellheight=20,
treeheight_row = 0, treeheight_col = 0,fontsize = 8,na_col = "grey50", main="Features used for clustering",
angle_col = 90,color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
We also print tha Cluster movement data (towards or away from the
tumor):
print(Cluster_movement) #3 per cluster we can see the direction
## # A tibble: 7 × 2
## cluster2 movement
## <dbl> <dbl>
## 1 5 -3.50
## 2 4 -1.22
## 3 6 -0.501
## 4 1 0.0203
## 5 3 2.38
## 6 7 3.12
## 7 2 7.60
Plots the clusters according to direction and other relevant
features
## Movement within the clusters in UMAP
## plot clusters giving them a color according to direction
# Convert df to a data frame
# Add cluster2 as a column
df1<-as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
##set first order clusters based on the heatmap:
UMAP representation with the behavioral clusters:
ggplot(master_class_sum, aes(x = V1, y = V2, color = as.factor(cluster2))) +
geom_point(size = 2, alpha = 1) +
labs(color = "Cluster") +
xlab("") +
ylab("") +
scale_color_manual(values = setNames(mypalette_1, levels(df1$cluster2))) + # Match palette to cluster2 levels ggtitle("UMAP Cluster") +
theme_light(base_size = 20) +
theme(axis.text.x = element_blank(),
axis.text.y = element_blank(),
aspect.ratio = 1)
UMAP representation with the behavioral clusters among mice:
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(cluster2))) +
geom_point(size=1, alpha=0.8) + labs(color="cluster")+
xlab("") + ylab("")+
ggtitle("umap Cluster among mice") +scale_color_manual(values = setNames(mypalette_1, levels(df1$cluster2))) +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+facet_wrap(mouse~.)
UMAP representation with the localization bistribution of each
track:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=distance_to_tumor)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP localization distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the raw movement of each track:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=movement)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP raw_movement") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the speed distribution of each track:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=speed)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP speed distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the squared displacement disitribution of each
track:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=disp2)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP disp2distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
UMAP representation with the displacement length of each track:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=disp_l)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP disp_length") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))
Provides a pie chart of the cluster distribution along the mice
experiments
### plot enrichment on edge and core:
Number_cell_exp<-umap_3%>%group_by(position,class,mouse)%>%
summarise(total_cell = n())
Percentage_clus<-left_join(Number_cell_exp,umap_3)
Percentage_clus <- Percentage_clus%>%group_by(cluster2,position,class,mouse)%>%
summarise(total_cell = mean(total_cell), num_cluster=n())%>%mutate(percentage=num_cluster*100/total_cell)%>%ungroup()
Percentage_clus_2 <- Percentage_clus%>%group_by(cluster2, position,class,mouse)%>%summarise(se_percentage=sd(percentage)/sqrt(length(percentage)),percentage = mean(percentage))
### Plot circular map
Percentage_clus_2$cluster2<-as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2<- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))
Behavioral cluster distribution in each mouse
Per2<-ggplot(Percentage_clus_2, aes(fill=as.factor(cluster2), y=percentage, x="")) +
geom_bar( stat="identity", position="fill", width = 0.5)+ coord_flip()+ scale_y_reverse()
Per2 <- Per2 + facet_grid(.~mouse)
Per2<-Per2+theme_void()+ scale_fill_manual(values=mypalette_1)+theme(strip.text.x = element_text(size = 8, angle = 90))
Per2
Mouse distribution in eacch behavioral cluster
Per4<-ggplot(Percentage_clus_2, aes(fill=as.factor(mouse), y=percentage, x="")) +
geom_bar( stat="identity", position="fill", width = 0.5)+ coord_flip()+ scale_y_reverse()
Per4 <- Per4 + facet_grid(cluster2~.)
Per4<-Per4+theme_void()+theme(strip.text.x = element_text(size = 8, angle = 90))
Per4
This section analyzes the differences between clusters for different features and provides statistical support in each case.
### plot differences of the values per cluster
master_class_sum<-as.data.frame(master_class_sum)
master_class_sum$cluster2<-as.character(master_class_sum$cluster2)
df1$mean_movement<-df1$movement
master_class_sum2<-left_join(master_class_sum, df1[,c("cluster2","mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels=df1$cluster2)
#### try an anova between clusters based speed
p_speed <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=speed, group=cluster2,fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("speed per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_speed <- aov(speed ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based direction
p_direction <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=movement, group=cluster2,fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("speed per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(-15,20))
a_direction <- aov(movement ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=dist_3_neigh, group=cluster2, fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot(alpha=0.5, outlier.colour = NA)+ggtitle("dist 3 neigh per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(10,70))
a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=dist_10_neigh, group=cluster2,fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("dist 10 neigh per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(20,90))
a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)) ,aes(x=as.factor(cluster2) , y=min_MG, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("distance to closest MG")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(2,45))
a_min_MG <- aov(min_MG ~ cluster2, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)) ,aes(x=as.factor(cluster2) , y=min_SR101, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("distance to closest SR101")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,50))
a_min_SR101 <- aov(min_SR101 ~ cluster2, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)) ,aes(x=as.factor(cluster2) , y=n_MG, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("n MG")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,15))
a_n_MG <- aov(n_MG ~ cluster2, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)) ,aes(x=as.factor(cluster2) , y=n_SR101, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("n SR101")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_nSR101 <- aov(n_SR101 ~ cluster2, data= subset(master_class_sum2, !is.na(n_SR101)))
#### try an anova between clusters based on mean distance to BV
p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)) ,aes(x=as.factor(cluster2) , y=BV_mean, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("BV_mean")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_BV_mean <- aov(BV_mean ~ cluster2, data= subset(master_class_sum2, !is.na(BV_mean)))
#### try an anova between clusters based on min distance to BV
p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_min, group=cluster2, fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("BV_min")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_BV_min <- aov(BV_min ~ cluster2, data= subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on contact to BV
p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_contact, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("BV_contact")+theme_classic()+ scale_fill_manual(values=mypalette_1)+
theme(aspect.ratio=0.7)
a_BV_contact <- aov(BV_contact ~ cluster2, data= subset(master_class_sum2, !is.na(BV_contact)))
#### try an anova between clusters based on sd distance to BV
p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_sd, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("Standart deviation distance")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,1.7))
a_BV_sd <- aov(BV_sd ~ cluster2, data= subset(master_class_sum2, !is.na(BV_min)))
Statistical information based on ANOVA and Tukey’s HSD (Honestly
Significant Difference) test:
summary(a_speed)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 5845 974.1 139.3 <2e-16 ***
## Residuals 974 6810 7.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_speed)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = speed ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 4-5 3.4252191 2.47622035 4.3742179 0.0000000
## 6-5 -1.2290630 -2.22938491 -0.2287410 0.0055133
## 1-5 -0.6282589 -1.55044151 0.2939238 0.4070770
## 3-5 4.4510197 3.36960729 5.5324320 0.0000000
## 7-5 0.4434247 -0.70288420 1.5897335 0.9146330
## 2-5 5.9769606 4.79932942 7.1545917 0.0000000
## 6-4 -4.6542821 -5.48652102 -3.8220431 0.0000000
## 1-4 -4.0534780 -4.78995506 -3.3170009 0.0000000
## 3-4 1.0258005 0.09766781 1.9539333 0.0193761
## 7-4 -2.9817945 -3.98479079 -1.9787981 0.0000000
## 2-4 2.5517415 1.51309196 3.5903909 0.0000000
## 1-6 0.6008041 -0.20072185 1.4023300 0.2884555
## 3-6 5.6800826 4.69953393 6.6606313 0.0000000
## 7-6 1.6724876 0.62079992 2.7241753 0.0000613
## 2-6 7.2060235 6.12028040 8.2917666 0.0000000
## 3-1 5.0792785 4.17858294 5.9799741 0.0000000
## 7-1 1.0716835 0.09402110 2.0493459 0.0211208
## 2-1 6.6052194 5.59101287 7.6194260 0.0000000
## 7-3 -4.0075950 -5.13669009 -2.8784999 0.0000000
## 2-3 1.5259409 0.36505898 2.6868228 0.0021095
## 2-7 5.5335359 4.31197211 6.7550997 0.0000000
summary(a_direction)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 8463 1410.5 153.5 <2e-16 ***
## Residuals 974 8953 9.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_direction)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = movement ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 4-5 2.6713684 1.583301210 3.759436 0.0000000
## 6-5 3.1148839 1.967972519 4.261795 0.0000000
## 1-5 3.9096763 2.852354966 4.966998 0.0000000
## 3-5 4.2404900 3.000605107 5.480375 0.0000000
## 7-5 7.1523745 5.838082998 8.466666 0.0000000
## 2-5 12.4077166 11.057512774 13.757920 0.0000000
## 6-4 0.4435155 -0.510681578 1.397713 0.8160760
## 1-4 1.2383079 0.393905880 2.082710 0.0003258
## 3-4 1.5691217 0.504978322 2.633265 0.0002934
## 7-4 4.4810061 3.331028494 5.630984 0.0000000
## 2-4 9.7363482 8.545492746 10.927204 0.0000000
## 1-6 0.7947924 -0.124190902 1.713776 0.1412005
## 3-6 1.1256062 0.001365711 2.249847 0.0494893
## 7-6 4.0374906 2.831686288 5.243295 0.0000000
## 2-6 9.2928327 8.047982405 10.537683 0.0000000
## 3-1 0.3308138 -0.701871767 1.363499 0.9648046
## 7-1 3.2426982 2.121766955 4.363629 0.0000000
## 2-1 8.4980403 7.335209668 9.660871 0.0000000
## 7-3 2.9118844 1.617329260 4.206440 0.0000000
## 2-3 8.1672266 6.836226414 9.498227 0.0000000
## 2-7 5.2553421 3.854767640 6.655917 0.0000000
summary(a_dist_3_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 4018 669.7 3.042 0.00593 **
## Residuals 974 214391 220.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 4-5 4.2359010 -1.0886598 9.560462 0.2212689
## 6-5 1.7200885 -3.8924321 7.332609 0.9717373
## 1-5 2.6482996 -2.5258037 7.822403 0.7375413
## 3-5 4.7235797 -1.3439161 10.791075 0.2448713
## 7-5 6.6825348 0.2509234 13.114146 0.0356701
## 2-5 7.0240372 0.4166854 13.631389 0.0287314
## 6-4 -2.5158125 -7.1852674 2.153642 0.6876824
## 1-4 -1.5876014 -5.7197638 2.544561 0.9171897
## 3-4 0.4876787 -4.7198088 5.695166 0.9999633
## 7-4 2.4466337 -3.1808920 8.074160 0.8591229
## 2-4 2.7881362 -3.0394293 8.615702 0.7946058
## 1-6 0.9282111 -3.5689219 5.425344 0.9965180
## 3-6 3.0034912 -2.4980873 8.505070 0.6739995
## 7-6 4.9624462 -0.9382728 10.863165 0.1659098
## 2-6 5.3039487 -0.7878457 11.395743 0.1356252
## 3-1 2.0752801 -2.9782654 7.128826 0.8891141
## 7-1 4.0342352 -1.4511493 9.519620 0.3112723
## 2-1 4.3757376 -1.3146855 10.066161 0.2587637
## 7-3 1.9589550 -4.3760748 8.293985 0.9704476
## 2-3 2.3004575 -4.2129192 8.813834 0.9437300
## 2-7 0.3415025 -6.5123429 7.195348 0.9999991
summary(a_dist_10_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 10048 1674.7 4.073 0.000483 ***
## Residuals 974 400485 411.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 4-5 5.7655696 -1.5117893 13.042929 0.2256912
## 6-5 2.3643001 -5.3066287 10.035229 0.9709258
## 1-5 4.5153350 -2.5563856 11.587056 0.4898057
## 3-5 5.8992785 -2.3934889 14.192046 0.3523302
## 7-5 10.7303565 1.9399330 19.520780 0.0060072
## 2-5 11.6200694 2.5894523 20.650687 0.0028993
## 6-4 -3.4012696 -9.7832605 2.980721 0.6987152
## 1-4 -1.2502347 -6.8978796 4.397410 0.9948828
## 3-4 0.1337089 -6.9836398 7.251058 1.0000000
## 7-4 4.9647869 -2.7266503 12.656224 0.4759913
## 2-4 5.8544998 -2.1103422 13.819342 0.3119473
## 1-6 2.1510349 -3.9954346 8.297504 0.9461095
## 3-6 3.5349784 -3.9843198 11.054277 0.8078416
## 7-6 8.3660564 0.3012316 16.430881 0.0361993
## 2-6 9.2557694 0.9297918 17.581747 0.0182146
## 3-1 1.3839435 -5.5230045 8.290892 0.9970459
## 7-1 6.2150215 -1.2821435 13.712187 0.1794623
## 2-1 7.1047345 -0.6726678 14.882137 0.0995440
## 7-3 4.8310780 -3.8273424 13.489498 0.6506465
## 2-3 5.7207909 -3.1813855 14.622967 0.4816471
## 2-7 0.8897129 -8.4778000 10.257226 0.9999602
summary(a_min_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1882 313.6 2.195 0.042 *
## Residuals 586 83746 142.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_min_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 0.6524529 -5.113751 6.4186567 0.9998871
## 6-5 -2.3143247 -8.844121 4.2154712 0.9423278
## 1-5 -0.9704284 -6.770981 4.8301246 0.9989173
## 3-5 -1.9062464 -8.313823 4.5013298 0.9754312
## 7-5 -1.6642724 -8.811350 5.4828055 0.9931924
## 2-5 -6.0311486 -13.140920 1.0786230 0.1575832
## 6-4 -2.9667776 -8.050977 2.1174217 0.5985010
## 1-4 -1.6228813 -5.728641 2.4828784 0.9053673
## 3-4 -2.5586992 -7.484943 2.3675448 0.7224896
## 7-4 -2.3167252 -8.172626 3.5391752 0.9049957
## 2-4 -6.6836015 -12.493911 -0.8732916 0.0125424
## 1-6 1.3438963 -3.779227 6.4670195 0.9871547
## 3-6 0.4080784 -5.393388 6.2095444 0.9999932
## 7-6 0.6500523 -5.959085 7.2591898 0.9999504
## 2-6 -3.7168239 -10.285601 2.8519529 0.6337098
## 3-1 -0.9358179 -5.902224 4.0305882 0.9978804
## 7-1 -0.6938440 -6.583571 5.1958826 0.9998571
## 2-1 -5.0607202 -10.905120 0.7836797 0.1398017
## 7-3 0.2419740 -6.246439 6.7303865 0.9999998
## 2-3 -4.1249023 -10.572198 2.3223939 0.4858404
## 2-7 -4.3668763 -11.549586 2.8158333 0.5494162
summary(a_min_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1592 265.4 1.65 0.132
## Residuals 381 61282 160.8
TukeyHSD(a_min_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 4.2984564 -3.0419390 11.638852 0.5923734
## 6-5 6.0536360 -0.6349611 12.742233 0.1054422
## 1-5 5.0554809 -1.3812215 11.492183 0.2333010
## 3-5 5.7607133 -2.8890369 14.410464 0.4329141
## 7-5 2.4360899 -5.6200109 10.492191 0.9729954
## 2-5 6.5272165 -2.2159285 15.270361 0.2909601
## 6-4 1.7551796 -4.8173285 8.327688 0.9857058
## 1-4 0.7570245 -5.5589608 7.073010 0.9998394
## 3-4 1.4622569 -7.0980416 10.022555 0.9987626
## 7-4 -1.8623665 -9.8223471 6.097614 0.9929172
## 2-4 2.2287601 -6.4258985 10.883419 0.9881855
## 1-6 -0.9981551 -6.5432066 4.546896 0.9983359
## 3-6 -0.2929227 -8.3013331 7.715488 0.9999999
## 7-6 -3.6175461 -10.9807803 3.745688 0.7704306
## 2-6 0.4735805 -7.6356144 8.582775 0.9999977
## 3-1 0.7052324 -7.0940268 8.504492 0.9999693
## 7-1 -2.6193910 -9.7545872 4.515805 0.9313814
## 2-1 1.4717356 -6.4309759 9.374447 0.9979854
## 7-3 -3.3246234 -12.5060212 5.856774 0.9355803
## 2-3 0.7665032 -9.0232818 10.556288 0.9999869
## 2-7 4.0911266 -5.1783108 13.360564 0.8479124
summary(a_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 121 20.15 2.013 0.0621 .
## Residuals 586 5867 10.01
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 -0.10026316 -1.6264769 1.4259506 0.9999955
## 6-5 0.92225352 -0.8060697 2.6505768 0.6960651
## 1-5 0.15586207 -1.3794433 1.6911674 0.9999403
## 3-5 0.27743590 -1.4185379 1.9734097 0.9990472
## 7-5 0.02583333 -1.8658737 1.9175404 1.0000000
## 2-5 1.39020408 -0.4916287 3.2720368 0.3048357
## 6-4 1.02251668 -0.3231823 2.3682157 0.2715864
## 1-4 0.25612523 -0.8305979 1.3428483 0.9927346
## 3-4 0.37769906 -0.9261920 1.6815901 0.9785390
## 7-4 0.12609649 -1.4238584 1.6760514 0.9999839
## 2-4 1.49046724 -0.0474206 3.0283551 0.0644766
## 1-6 -0.76639145 -2.1223929 0.5896100 0.6349877
## 3-6 -0.64481762 -2.1803647 0.8907294 0.8772307
## 7-6 -0.89642019 -2.6457438 0.8529034 0.7351342
## 2-6 0.46795056 -1.2706903 2.2065914 0.9853064
## 3-1 0.12157383 -1.1929474 1.4360950 0.9999655
## 7-1 -0.13002874 -1.6889368 1.4288793 0.9999813
## 2-1 1.23434201 -0.3125689 2.7812529 0.2173616
## 7-3 -0.25160256 -1.9689724 1.4657672 0.9994936
## 2-3 1.11276818 -0.5937188 2.8192552 0.4616696
## 2-7 1.36437075 -0.5367674 3.2655089 0.3402357
summary(a_nSR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 200 33.39 2.264 0.0369 *
## Residuals 381 5620 14.75
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 -1.907407407 -4.130319 0.31550372 0.1469657
## 6-5 -2.373493976 -4.399019 -0.34796863 0.0102054
## 1-5 -1.854368932 -3.803612 0.09487449 0.0742116
## 3-5 -1.966666667 -4.586093 0.65275980 0.2842791
## 7-5 -2.105263158 -4.544913 0.33438690 0.1421648
## 2-5 -2.103448276 -4.751158 0.54426117 0.2210468
## 6-4 -0.466086568 -2.456456 1.52428324 0.9928838
## 1-4 0.053038475 -1.859648 1.96572480 1.0000000
## 3-4 -0.059259259 -2.651597 2.53307831 1.0000000
## 7-4 -0.197855750 -2.608397 2.21268598 0.9999827
## 2-4 -0.196040868 -2.816954 2.42487201 0.9999900
## 1-6 0.519125044 -1.160097 2.19834749 0.9698479
## 3-6 0.406827309 -2.018381 2.83203516 0.9988836
## 7-6 0.268230818 -1.961597 2.49805827 0.9998360
## 2-6 0.270045700 -2.185683 2.72577439 0.9999030
## 3-1 -0.112297735 -2.474168 2.24957231 0.9999993
## 7-1 -0.250894226 -2.411664 1.90987588 0.9998666
## 2-1 -0.249079344 -2.642278 2.14411943 0.9999298
## 7-3 -0.138596491 -2.919023 2.64183020 0.9999991
## 2-3 -0.136781609 -3.101448 2.82788456 0.9999994
## 2-7 0.001814882 -2.805273 2.80890285 1.0000000
summary(a_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 285 47.52 1.363 0.227
## Residuals 625 21787 34.86
TukeyHSD(a_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 0.4726330 -2.0122580 2.957524 0.9977701
## 6-5 1.7553443 -0.8343473 4.345036 0.4124694
## 1-5 1.0062824 -1.2609679 3.273533 0.8459383
## 3-5 0.8528448 -2.0791196 3.784809 0.9781120
## 7-5 0.7435835 -2.2191517 3.706319 0.9898449
## 2-5 -0.8918150 -4.0862936 2.302664 0.9822651
## 6-4 1.2827113 -1.1723422 3.737765 0.7170510
## 1-4 0.5336494 -1.5785080 2.645807 0.9894795
## 3-4 0.3802118 -2.4335402 3.193964 0.9996831
## 7-4 0.2709505 -2.5748507 3.116752 0.9999592
## 2-4 -1.3644480 -4.4507854 1.721889 0.8483617
## 1-6 -0.7490619 -2.9835704 1.485447 0.9557844
## 3-6 -0.9024995 -3.8092192 2.004220 0.9696107
## 7-6 -1.0117608 -3.9495157 1.925994 0.9496924
## 2-6 -2.6471593 -5.8184836 0.524165 0.1724548
## 3-1 -0.1534376 -2.7769735 2.470098 0.9999977
## 7-1 -0.2626989 -2.9205787 2.395181 0.9999491
## 2-1 -1.8980974 -4.8120670 1.015872 0.4632607
## 7-3 -0.1092613 -3.3527650 3.134242 0.9999999
## 2-3 -1.7446598 -5.2011334 1.711814 0.7490384
## 2-7 -1.6353985 -5.1180117 1.847215 0.8077205
summary(a_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 795 132.56 4.789 8.67e-05 ***
## Residuals 625 17298 27.68
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 -0.2976702 -2.51183729 1.9164970 0.9996924
## 6-5 2.3976555 0.09010554 4.7052054 0.0356979
## 1-5 1.2187043 -0.80153359 3.2389422 0.5591439
## 3-5 -0.6217917 -3.23432445 1.9907410 0.9923539
## 7-5 0.7071357 -1.93281547 3.3470868 0.9856916
## 2-5 -1.6066566 -4.45310316 1.2397899 0.6366814
## 6-4 2.6953256 0.50774531 4.8829060 0.0053647
## 1-4 1.5163745 -0.36566755 3.3984165 0.2073611
## 3-4 -0.3241215 -2.83132084 2.1830778 0.9997549
## 7-4 1.0048058 -1.53095106 3.5405627 0.9044382
## 2-4 -1.3089865 -4.05907353 1.4411006 0.7975822
## 1-6 -1.1789511 -3.17001443 0.8121121 0.5815075
## 3-6 -3.0194472 -5.60948561 -0.4294088 0.0107236
## 7-6 -1.6905198 -4.30821226 0.9271726 0.4741915
## 2-6 -4.0043121 -6.83012696 -1.1784972 0.0006271
## 3-1 -1.8404960 -4.17820294 0.4972109 0.2318372
## 7-1 -0.5115687 -2.87987782 1.8567405 0.9954857
## 2-1 -2.8253610 -5.42185939 -0.2288625 0.0228434
## 7-3 1.3289274 -1.56120312 4.2190579 0.8227337
## 2-3 -0.9848649 -4.06476266 2.0950328 0.9648489
## 2-7 -2.3137923 -5.41698176 0.7893972 0.2939195
summary(a_BV_contact)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1.08 0.1796 1.11 0.355
## Residuals 625 101.14 0.1618
TukeyHSD(a_BV_contact)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 0.045250056 -0.1240529 0.21455305 0.9858546
## 6-5 0.073128252 -0.1033151 0.24957162 0.8839802
## 1-5 0.039593646 -0.1148808 0.19406813 0.9886482
## 3-5 0.060006090 -0.1397573 0.25976951 0.9742434
## 7-5 0.046885338 -0.1549746 0.24874527 0.9932967
## 2-5 -0.096299061 -0.3139484 0.12135023 0.8478656
## 6-4 0.027878197 -0.1393919 0.19514827 0.9989423
## 1-4 -0.005656410 -0.1495640 0.13825113 0.9999998
## 3-4 0.014756035 -0.1769532 0.20646529 0.9999884
## 7-4 0.001635282 -0.1922576 0.19552815 1.0000000
## 2-4 -0.141549117 -0.3518304 0.06873219 0.4213091
## 1-6 -0.033534606 -0.1857783 0.11870908 0.9949759
## 3-6 -0.013122162 -0.2111656 0.18492127 0.9999952
## 7-6 -0.026242915 -0.2264009 0.17391504 0.9997340
## 2-6 -0.169427313 -0.3854990 0.04664441 0.2362555
## 3-1 0.020412444 -0.1583368 0.19916172 0.9998811
## 7-1 0.007291691 -0.1737975 0.18838092 0.9999998
## 2-1 -0.135892707 -0.3344301 0.06264468 0.4000286
## 7-3 -0.013120753 -0.2341103 0.20786877 0.9999975
## 2-3 -0.156305151 -0.3918049 0.07919464 0.4393201
## 2-7 -0.143184398 -0.3804652 0.09409636 0.5587602
summary(a_BV_sd)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 9.93 1.6556 15.81 <2e-16 ***
## Residuals 625 65.46 0.1047
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 4-5 0.15450444 0.018296776 0.29071210 0.0146745
## 6-5 -0.09589194 -0.237844183 0.04606030 0.4168015
## 1-5 -0.07220809 -0.196485916 0.05206974 0.6037478
## 3-5 0.28437055 0.123656857 0.44508424 0.0000047
## 7-5 0.02364182 -0.138758558 0.18604219 0.9995126
## 2-5 0.17833357 0.003230335 0.35343681 0.0427288
## 6-4 -0.25039638 -0.384968519 -0.11582424 0.0000011
## 1-4 -0.22671252 -0.342489037 -0.11093601 0.0000002
## 3-4 0.12986611 -0.024367844 0.28410007 0.1643859
## 7-4 -0.13086262 -0.286853339 0.02512810 0.1677768
## 2-4 0.02382913 -0.145346412 0.19300468 0.9995971
## 1-6 0.02368386 -0.098799253 0.14616697 0.9975548
## 3-6 0.38026249 0.220932570 0.53959242 0.0000000
## 7-6 0.11953376 -0.041497340 0.28056486 0.2992373
## 2-6 0.27422551 0.100391466 0.44805956 0.0000766
## 3-1 0.35657864 0.212771249 0.50038602 0.0000000
## 7-1 0.09584990 -0.049840026 0.24153983 0.4505527
## 2-1 0.25054166 0.090814337 0.41026898 0.0000866
## 7-3 -0.26072873 -0.438519250 -0.08293822 0.0003357
## 2-3 -0.10603698 -0.295501301 0.08342734 0.6460766
## 2-7 0.15469175 -0.036205391 0.34558890 0.2014832
The output in this section shows a boxplot for each feature in each
behavioral cluster
p_speed
p_direction
p_dist3_neigth
p_dist_10_neigh
p_min_MG
p_min_SR101
p_n_MG
p_nSR101
p_BV_mean
p_BV_min
p_BV_contact
p_BV_sd
Select a position of interest to save .txt information
about the specified position of interest in a .txt
file.
This code section uses the master_distance.rds file
generated in the File Import section
# Determine position of interest
position_interest<-"2430F11.CL3_1"
names_cell<-master_class[!duplicated(master_class$mouse),]
names_cell$position
## [1] 2430F11.CL3_1 2430F13.CL2_2 2430F16.CL3_1 2430F17.CL3_2 2430F18.CL2_1
## [6] 2430F12.CL2_2
## 36 Levels: 2430F11.CL1_1 2430F12.CL1_1 2430F13.CL1_1 ... 2430F18.CL3_2
Tracks_1<-master_class[which(master_class$position==position_interest),]
Tracks_1<-Tracks_1[!duplicated(Tracks_1$Track2),c(2,which( colnames(master_class)=="cluster2" ))]
##Join with information on unique TrackID for that experiment
# Generated during the data import
master_distance<-readRDS("D:/BEHAV3D_Tumor_Profiler/results/master_distance.rds")
master_distance<-master_distance[c("TrackID", "Track2")]%>% distinct(TrackID, .keep_all = TRUE)
Tracks_1<-left_join(Tracks_1,master_distance)
Tracks_1$Track2<-NULL
Track_1_list<-split(Tracks_1,Tracks_1$cluster2)
write(paste(as.character(Track_1_list), sep="' '", collapse=", "), paste0(position_interest,".txt"))
Then, we plot the cell trajectories combined with the behavioral cluster information:
###Plot cell trajectories:
## bind cluster information data to original dataset:
Tracks_1<-master_class
###plot all the tracks
Tracks_1<-Tracks_1[!duplicated(Tracks_1$Track2),c("cluster2", "Track2")]
Tracks_1$cluster2<-as.factor(Tracks_1$cluster2)
master_traject<-left_join(master_cor2,Tracks_1)
master_traject<-na.omit(master_traject)
master_traject<-subset(master_traject, cluster2!=0)
master_traject<-master_traject%>%group_by(position, mouse)%>%mutate(x_new=x-min(x), y_new=y-min(y))
master_traject$cluster2 = factor(master_traject$cluster2, levels=df1$cluster2)
g_all_track<-ggplot(master_traject)+geom_path(aes(x_new, y_new, group=Track2,
color=as.factor(cluster2)), size=0.2,
arrow = arrow(ends = "last",type = "open",length = unit(0.01, "inches")))+scale_color_manual(values=mypalette_1 )+
theme_bw()+theme(aspect.ratio = 1)+facet_wrap(class~position)
g_all_track
g_track_int1<-ggplot(subset(master_traject, position%in%position_interest))+geom_path(aes(x_new, y_new, group=Track2, color=as.factor(cluster2)), size=0.5,arrow = arrow(ends = "last",type = "open",length = unit(0.00, "inches")))+
scale_color_manual(values=mypalette_1)+ggtitle(paste0(position_interest))+
theme_bw()+theme(aspect.ratio = 1)
g_track_int1
g_track_int2<-ggplot(subset(master_traject, position%in%position_interest))+geom_path(aes(x_new, y_new, group=Track2), size=0.5,arrow = arrow(ends = "last",type = "open",length = unit(0.00, "inches")))+
scale_color_manual(values=mypalette_1)+ggtitle(paste0(position_interest))+
theme_bw()+theme(aspect.ratio = 1)
g_track_int2
We firstly create an output directory for the module:
# Create a directory for the results
folder <- "/content/results/large_scale_phenotyping_module"
if (file.exists(folder)) {
cat("The '/content/results/large_scale_phenotyping_module' folder already exists")
} else {
dir.create(folder)
}
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
###MULTIVARIATE
list_master_pca <- split(master_pca3[,-c(1)],master_pca3$Track2) ## split into list
if ( ! file.exists("D:/BEHAV3D_Tumor_Profiler/results/matrix_distmat.rds")){
##parallel working
# load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores()-2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
###MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat<-as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
}else{
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2<-as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience
# Modify the parameters according to the characteristics of the experiment
umap_dist<- umap(matrix_distmat,n_components=2,input="dist",init = "random",
n_neighbors=7,min_dist=0.5, spread=6,n_epochs=1000 , local_connectivity=1)
umap_1 <- as.data.frame(umap_dist$`layout`)
Here, we can see the Raw UMAP plot:
plot(umap_dist$`layout`, # Plot the result
col=rgb(0,0,0,alpha=0.1),
pch=19,
asp=0.4, xlab="UMAP 1", ylab="UMAP 2",
main="Raw UMAP")
Here we determine the number of clusters for the analysis.
(n_cluster)
umap_1 <- as.data.frame(umap_dist$`layout`)
Track2_umap<-cbind(Track2,umap_1)
positiontype<-master_norm[,c("Track2", "position","class","mouse")]
positiontype<- positiontype[!duplicated(positiontype$Track2),]
umap_2 <- left_join(Track2_umap ,positiontype)
# Clustering
n_cluster=7
hc <- hclust(dist(umap_dist$`layout`, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1]<- "cluster2"
UMAP positions colored by large-scale regions
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(class))) +
geom_point(size=0.5, alpha=0.8) + labs(color="Environment_cluster")+
xlab("") + ylab("")+
ggtitle("umap Cluster") +scale_color_manual(values=c("cyan","gold","red"))+
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+facet_wrap(mouse~position)
UMAP colored by large-scale regions
ggplot(umap_3, aes(x=V1, y=V2, color=as.factor(class))) +
geom_point(size=2, alpha=0.6) + labs(color="Tumor region phenotype")+
xlab("") + ylab("")+
ggtitle("distribution of large scale regions")+scale_color_manual(values=c("cyan","gold","red"))+
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)
### calculate average stats per track
master_class <- left_join(master_cor2 ,umap_3[c(1:4)], by = c("Track2" = "Track2"))
master_class<-master_class %>%
filter(cluster2 != 0)
master_class$cluster2<-as.numeric(master_class$cluster2)
master_class_sum<-master_class%>%group_by(position,class,mouse, Track2)%>%
arrange(Time)%>%summarize(cluster2=mean(cluster2), V1=mean(V1), V2=mean(V2),
dist_tumor=mean(dist_tumor), dist_3_neigh=mean(dist_3_neigh),
dist_10_neigh=mean(dist_10_neigh), speed=mean(speed), disp2=mean(disp2),
disp_d=mean(disp_d),disp_l=mean(disp_l), movement=last(dist_tumor),
distance_to_tumor=last(dist_tumor_1),raw_dist_tumor=last(dist_tumor),
Time=first(Time))%>%ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum,master_distance_MG[c("Track2","n_MG","min_MG")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,master_distance_SR101[c("Track2","n_SR101","min_SR101")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,BV_df_sum ,by = c("Track2" = "Track2"))
mid <- 0
master_class_sum$movement2<-ifelse(master_class_sum$movement>0,"away from tumor", ifelse(master_class_sum$movement==0, "no movement", "towards the tumor"))
master_class_sum<-master_class_sum[!is.na(master_class_sum$cluster2),]
saveRDS(master_class_sum,"master_class_sum.rds")
write.csv(master_class_sum,"master_class_sum.csv")
Provides a pie chart of the cluster distribution along the mice experiments
sum_all<-master_class_sum%>%group_by(cluster2)%>%
summarise(displacement2 = median(disp2),
speed = median(speed),displacement_delta=median(disp_d),displacement_length=median(disp_l),max_speed=quantile(speed, 0.75), persistance=(displacement_length/displacement_delta),nearest_10_cell= median(dist_10_neigh),dist_tumor_core=median(distance_to_tumor),
n_SR101=mean(n_SR101, na.rm = T),
min_SR101=mean(min_SR101, na.rm = T),n_MG=mean(n_MG, na.rm = T),min_MG=mean(min_MG, na.rm = T),min_BV_dist=mean(BV_min, na.rm = T),sd_BV_dist=mean(BV_sd, na.rm = T),mean_BV_dist=mean(BV_mean, na.rm = T),
movement=median(movement) )
Cluster_movement<-sum_all[,c(1,length(names(sum_all)))]
Cluster_movement <-Cluster_movement[order(Cluster_movement$movement),]
stdize = function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df<-sum_all[,-c(1)]
df<-df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE]%>%ungroup()
df<-as.data.frame(df)
rownames(df)<-Cluster_movement$cluster2
## plot clusters giving them a color according to direction
# Convert df to a data frame
# Add cluster2 as a column
df1<-as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
# Make a color palette that adjusts to the cluster direction
# Generate a continuous palette with 100 colors
continuous_palette <- colorRampPalette(c("cyan4", "darkturquoise","darkorchid4","darkorchid1", "deeppink4","deeppink1", "goldenrod3", "gold"))(50)
# Select 8 colors from the continuous palette
mypalette_1<- continuous_palette[seq(1, length(continuous_palette), length.out = n_cluster)]
### plot enrichment on edge and core:
Number_cell_exp<-umap_3%>%group_by(position,class,mouse)%>%
summarise(total_cell = n())
Percentage_clus<-left_join(Number_cell_exp,umap_3)
Percentage_clus <- Percentage_clus%>%group_by(cluster2,position,class,mouse)%>%
summarise(total_cell = mean(total_cell), num_cluster=n())%>%mutate(percentage=num_cluster*100/total_cell)%>%ungroup()
Percentage_clus_2 <- Percentage_clus%>%group_by(cluster2, position,class,mouse)%>%summarise(se_percentage=sd(percentage)/sqrt(length(percentage)),percentage = mean(percentage))
Plots the pie chart of:
### Plot circular map
Percentage_clus_2$cluster2<-as.factor(Percentage_clus_2$cluster2)
Percentage_clus_2$cluster2<- factor(Percentage_clus_2$cluster2, levels = levels(df1$cluster2))
Per1<-ggplot(Percentage_clus_2, aes(fill=as.factor(cluster2), y=percentage, x="")) +
geom_bar( stat="identity", position="fill", width = 0.5)+ coord_flip()+ scale_y_reverse()
Per1 <- Per1 + facet_grid(class~mouse)
Per1<-Per1+theme_void()+ scale_fill_manual(values=mypalette_1)+theme(strip.text.x = element_text(size = 8, angle = 90))
Per1
- Behavioral cluster distribution in large-scale regions
Per3<-ggplot(Percentage_clus_2, aes(fill=as.factor(cluster2), y=percentage, x="")) +
geom_bar( stat="identity", position="fill", width = 0.5)+ coord_flip()+ scale_y_reverse()
Per3 <- Per3 + facet_grid(class~.)
Per3<-Per3+theme_void()+theme(strip.text.x = element_text(size = 8, angle = 90))+ scale_fill_manual(values=mypalette_1)
Per3
### Large-scale regions’ statistics
# Additional Statistics
# Create an empty dataframe to store results
results_df <- data.frame(cluster2 = character(), p_value = numeric(),contrast=character(), pairwise_p_values=numeric())
# Create an empty list to store plots
plots_list <- list()
# Iterate over unique clusters
for (cluster in unique(Percentage_clus_2$cluster2)) {
# Subset data for the current cluster
cluster_data <- subset(Percentage_clus_2, cluster2 == cluster)
# Calculate mean percentage for each mouse and subtract from original percentage
normalized_data <- cluster_data %>%
group_by(mouse) %>%
mutate(normalized_percentage = (percentage - mean(percentage)) / sd(percentage)) %>%
ungroup()
# Fit model with normalized percentage values
model <- lm(normalized_percentage ~ class, data = normalized_data)
# Extract estimates and p-value from model summary
model_summary <- summary(model)
p_value <- anova(model)$`Pr(>F)`[1]
# Conduct pairwise comparisons using emmeans
pairwise_comp <- emmeans(model, pairwise ~ class, adjust = "tukey")
# Extract pairwise comparisons and p-values
pairwise_p_values <- summary(pairwise_comp)[["contrasts"]][["p.value"]]
contrast<-summary(pairwise_comp)[["contrasts"]][["contrast"]]
# Append cluster name, estimates, and p-value to results dataframe
results_df <- rbind(results_df, data.frame(cluster2 = cluster, p_value = p_value, contrast=contrast,pairwise_p_values=pairwise_p_values))
# Create boxplot of normalized percentages by class
plot <- ggplot(normalized_data, aes(x = class, y = normalized_percentage)) +
geom_boxplot(width=0.5) +geom_jitter(width = 0.3)+
labs(title = paste("Cluster:", cluster)) +
theme_bw()+theme(aspect.ratio = 1)
# Store plot in the list
plots_list[[cluster]] <- plot
}
Display the results dataframe:
# Display the results dataframe
print(results_df)
## cluster2 p_value contrast pairwise_p_values
## 1 1 0.2425891 CL1 - CL2 0.5948747
## 2 1 0.2425891 CL1 - CL3 0.7249399
## 3 1 0.2425891 CL2 - CL3 0.2159649
## 4 2 0.5799852 CL1 - CL2 0.9841552
## 5 2 0.5799852 CL1 - CL3 0.6946238
## 6 2 0.5799852 CL2 - CL3 0.5917744
## 7 3 0.5780029 CL1 - CL2 0.6283284
## 8 3 0.5780029 CL1 - CL3 0.9992766
## 9 3 0.5780029 CL2 - CL3 0.6314219
## 10 4 0.4313674 CL1 - CL2 0.8297038
## 11 4 0.4313674 CL1 - CL3 0.6933631
## 12 4 0.4313674 CL2 - CL3 0.4036878
## 13 5 0.2208125 CL1 - CL2 0.6662128
## 14 5 0.2208125 CL1 - CL3 0.1951935
## 15 5 0.2208125 CL2 - CL3 0.5598163
## 16 6 0.9260553 CL1 - CL2 0.9996268
## 17 6 0.9260553 CL1 - CL3 0.9383376
## 18 6 0.9260553 CL2 - CL3 0.9407299
## 19 7 0.8547917 CL1 - CL2 0.9956433
## 20 7 0.8547917 CL1 - CL3 0.8625180
## 21 7 0.8547917 CL2 - CL3 0.8929672
Plot the statistics in large-scale regions (environmental
clusters)
plots_list
## $`1`
##
## $`2`
##
## $`3`
##
## $`4`
##
## $`5`
##
## $`6`
##
## $`7`
This section analyzes the differences between large-scale regions for different features and provides statistical support in each case.
####Differences based on environmental clusters
### plot differences of the values per cluster
master_class_sum<-as.data.frame(master_class_sum)
master_class_sum$cluster2<-as.character(master_class_sum$cluster2)
df1$mean_movement<-df1$movement
master_class_sum2<-left_join(master_class_sum, df1[,c("cluster2","mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels=df1$cluster2)
#### try an anova between clusters based on distance to speed
p_class_speed <- ggplot(master_class_sum2,aes(x=as.factor(class) , y=speed, group=class,fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("speed per env cluster")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,20))
a_class_speed <- aov(speed ~ class, data=master_class_sum2)
#### try an anova between clusters based on squared displacement
p_class_disp2 <- ggplot(master_class_sum2,aes(x=as.factor(class) , y=disp2, group=class,fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("speed per env cluster")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,180))
a_class_disp2 <- aov(disp2 ~ class, data=master_class_sum2)
#### try an anova between clusters based on raw tumor movement
p_class_movement <- ggplot(master_class_sum2,aes(x=as.factor(class) , y=movement, group=class,fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("movement direction per env cluster")+theme_classic()+
theme(aspect.ratio=1)+scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")))+coord_cartesian(ylim = c(-10,17))
a_class_movement <- aov(movement ~ class, data=master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_class_dist_10neigh <- ggplot(master_class_sum2,aes(x=as.factor(class) , y=dist_10_neigh, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("dist 10 neigh per cluster")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(20,70))
a_class_dist_10neigh <- aov(dist_10_neigh ~ class, data=master_class_sum2)
#### try an anova between clusters based on dist to MG
p_class_minMG <- ggplot(subset(master_class_sum2, !is.na(min_MG)) ,aes(x=as.factor(class) , y=min_MG, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("distance to closest MG")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,50))
a_class_minMG <- aov(min_MG ~ class, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_classmin_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)) ,aes(x=as.factor(class) , y=min_SR101, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA) +stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("distance to closest SR101")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,50))
a_classmin_SR101 <- aov(min_SR101 ~ class, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_class_n_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)),aes(x=as.factor(class) , y=n_MG, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA) +stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("n of MG per cluster")+theme_classic()+
theme(aspect.ratio=1)
a_class_n_MG <- aov(n_MG ~ class, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_classn_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)),aes(x=as.factor(class) , y=n_SR101, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("n SR101 per cluster")+theme_classic()+
theme(aspect.ratio=1)
a_classn_SR101 <- aov(n_SR101 ~ class, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on BV distance min
p_class_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(class) , y=BV_min, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+ scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("min distance to BV")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,18))
a_class_BV_min <- aov(BV_min ~ class, data= subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on BV distance mean
p_class_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)) ,aes(x=as.factor(class) , y=BV_mean, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+ scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("mean distance to BV")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,18))
a_class_BV_mean <- aov(BV_mean ~ class, data= subset(master_class_sum2, !is.na(BV_mean)))
##relation between position relative to the tumor and amount of MG:
pcor_MG_position <- ggplot(subset(master_class_sum2, !is.na(min_MG)),aes(x=distance_to_tumor , y=n_MG)) +geom_jitter()+
geom_point(alpha=0.5,stat = "summary") +geom_smooth(method = "lm")+ggtitle("scatter plot movement vs min_MG")+theme_classic()+
theme(aspect.ratio=1)
cor_18<-cor.test(master_class_sum2$n_MG,master_class_sum2$distance_to_tumor, method = "pearson", use = "pairwise.complete.obs")
Statistical information between large-scale regions (environmental clusters) based on ANOVA and Tukey’s HSD (Honestly Significant Difference) test:
summary(a_class_speed)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 167 83.32 6.525 0.00153 **
## Residuals 978 12488 12.77
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_speed)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = speed ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 0.1001108 -0.5744387 0.7746603 0.9353020
## CL3-CL1 0.9064598 0.2405857 1.5723339 0.0041126
## CL3-CL2 0.8063490 0.1702051 1.4424928 0.0084220
summary(a_class_disp2)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2820 1410 0.064 0.938
## Residuals 978 21460215 21943
TukeyHSD(a_class_disp2)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = disp2 ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 4.144633 -23.81786 32.10713 0.9354622
## CL3-CL1 3.097067 -24.50580 30.69993 0.9624874
## CL3-CL2 -1.047566 -27.41801 25.32288 0.9952179
summary(a_class_movement)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 23 11.67 0.656 0.519
## Residuals 978 17392 17.78
TukeyHSD(a_class_movement)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = movement ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 0.1138014 -0.6822461 0.9098489 0.9398244
## CL3-CL1 0.3680570 -0.4177525 1.1538665 0.5146473
## CL3-CL2 0.2542556 -0.4964687 1.0049800 0.7062059
summary(a_class_dist_10neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 1320 659.9 1.577 0.207
## Residuals 978 409213 418.4
TukeyHSD(a_class_dist_10neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 -2.5151366 -6.376437 1.346164 0.2777655
## CL3-CL1 -0.1387336 -3.950374 3.672907 0.9959840
## CL3-CL2 2.3764030 -1.265054 6.017860 0.2764354
summary(a_class_minMG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2813 1406.4 10.02 5.26e-05 ***
## Residuals 590 82815 140.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_minMG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -5.075902 -7.941646 -2.210159 0.0001074
## CL3-CL1 -3.773329 -6.454107 -1.092551 0.0028689
## CL3-CL2 1.302574 -1.640317 4.245465 0.5519149
summary(a_classmin_SR101 )
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 3438 1719.0 11.13 1.99e-05 ***
## Residuals 385 59436 154.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classmin_SR101 )
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -9.024564 -13.534491 -4.514638 0.0000104
## CL3-CL1 -7.309367 -11.897598 -2.721135 0.0006007
## CL3-CL2 1.715198 -1.496459 4.926854 0.4207669
summary(a_class_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 644 322.0 35.55 2.64e-15 ***
## Residuals 590 5344 9.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.592120 1.8641533 3.3200858 0.0000000
## CL3-CL1 1.337224 0.6562435 2.0182050 0.0000145
## CL3-CL2 -1.254895 -2.0024589 -0.5073317 0.0002641
summary(a_classn_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 422 210.98 15.05 5.11e-07 ***
## Residuals 385 5398 14.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classn_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.6028601 1.2436788 3.962041 0.0000261
## CL3-CL1 3.2029326 1.8201521 4.585713 0.0000003
## CL3-CL2 0.6000725 -0.3678421 1.567987 0.3121099
summary(a_class_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 591 295.71 10.63 2.89e-05 ***
## Residuals 629 17502 27.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.059503 -2.238974 0.1199684 0.0885449
## CL3-CL1 -2.422248 -3.657506 -1.1869908 0.0000148
## CL3-CL2 -1.362746 -2.580036 -0.1454545 0.0237365
summary(a_class_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 463 231.36 6.735 0.00128 **
## Residuals 629 21609 34.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.3233714 -2.633946 -0.01279692 0.0471867
## CL3-CL1 -2.1065180 -3.479080 -0.73395634 0.0009792
## CL3-CL2 -0.7831466 -2.135745 0.56945151 0.3625740
Statistical information based on correlation test between position
relative to the tumor and amount of MG:
cor_18
##
## Pearson's product-moment correlation
##
## data: master_class_sum2$n_MG and master_class_sum2$distance_to_tumor
## t = -0.14404, df = 591, p-value = 0.8855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08639938 0.07462654
## sample estimates:
## cor
## -0.005924827
The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:
p_class_speed
p_class_disp2
p_class_movement
p_class_dist_10neigh
p_class_minMG
p_classmin_SR101
p_class_n_MG
p_classn_SR101
p_class_BV_min
p_class_BV_mean
We firstly create an output directory for the module:
# Create a directory for the results
folder <- "/content/results/small_scale_phenotyping_module"
if (file.exists(folder)) {
cat("The '/content/results/small_scale_phenotyping_module' folder already exists")
} else {
dir.create(folder)
}
Dynamic Time Warping (DTW) is an algorithm used to measure similarity between two time series by aligning them in a way that minimizes the distance between corresponding points. It is particularly useful for comparing time series that may vary in speed or timing, like on this case.
If the matrix_distmat.rds file exists, it will skip the
DTW analysis time, as it has been run previously. Else, it will run
normally and generate the file at the end.
###MULTIVARIATE
list_master_pca <- split(master_pca3[,-c(1)],master_pca3$Track2) ## split into list
if ( ! file.exists("D:/BEHAV3D_Tumor_Profiler/results/matrix_distmat.rds")){
##parallel working
# load parallel
library(parallel)
# create multi-process workers
workers <- makeCluster(detectCores()-2)
# load dtwclust in each one, and make them use 1 thread per worker
invisible(clusterEvalQ(workers, {
library(dtwclust)
RcppParallel::setThreadOptions(1L)
}))
# register your workers, e.g. with doParallel
require(doParallel)
registerDoParallel(workers)
###MULTIVARIATE
set.seed(123)
distmat <- proxy::dist(list_master_pca, method = "dtw")
saveRDS(distmat, file = "distmat.rds")
matrix_distmat<-as.matrix(distmat)
saveRDS(matrix_distmat, file = "matrix_distmat.rds")
}else{
matrix_distmat <- readRDS("matrix_distmat.rds")
}
Track2<-as.numeric(names(list_master_pca))
Now, we can perform the UMAP representation and cluster in the desired number of clusters. Any of the parameters can be modified to your convenience.
However, in this module, the output of the UMAP is not relevant, as it does not reflect any small-scale phenotype.
# Adjust parameters according to the characteristics of the experiment
umap_dist<- umap(matrix_distmat,n_components=2,input="dist",init = "random",
n_neighbors=7,min_dist=0.5, spread=6,n_epochs=1000 , local_connectivity=1)
umap_1 <- as.data.frame(umap_dist$`layout`)
Track2_umap<-cbind(Track2,umap_1)
positiontype<-master_norm[,c("Track2", "position","class","mouse")]
positiontype<- positiontype[!duplicated(positiontype$Track2),]
umap_2 <- left_join(Track2_umap ,positiontype)
# Clustering
set.seed(123)
n_cluster=7
hc <- hclust(dist(umap_dist$`layout`, method = "euclidean"), method = "ward.D2")
hierarchical_clusters <- cutree(hc, k = n_cluster)
umap_3 <- cbind(hierarchical_clusters, umap_2)
colnames(umap_3)[1]<- "cluster2"
### calculate average stats per track
master_class <- left_join(master_cor2 ,umap_3[c(1:4)], by = c("Track2" = "Track2"))
master_class<-master_class %>%
filter(cluster2 != 0)
master_class$cluster2<-as.numeric(master_class$cluster2)
master_class_sum<-master_class%>%group_by(position,class,mouse, Track2)%>%
arrange(Time)%>%summarize(cluster2=mean(cluster2), V1=mean(V1), V2=mean(V2),
dist_tumor=mean(dist_tumor), dist_3_neigh=mean(dist_3_neigh),
dist_10_neigh=mean(dist_10_neigh), speed=mean(speed), disp2=mean(disp2),
disp_d=mean(disp_d),disp_l=mean(disp_l), movement=last(dist_tumor),
distance_to_tumor=last(dist_tumor_1),raw_dist_tumor=last(dist_tumor),
Time=first(Time))%>%ungroup()
## Join the information on the SR101 and MG
master_class_sum <- left_join(master_class_sum,master_distance_MG[c("Track2","n_MG","min_MG")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,master_distance_SR101[c("Track2","n_SR101","min_SR101")] ,by = c("Track2" = "Track2"))
master_class_sum <- left_join(master_class_sum,BV_df_sum ,by = c("Track2" = "Track2"))
mid <- 0
master_class_sum$movement2<-ifelse(master_class_sum$movement>0,"away from tumor", ifelse(master_class_sum$movement==0, "no movement", "towards the tumor"))
master_class_sum<-master_class_sum[!is.na(master_class_sum$cluster2),]
saveRDS(master_class_sum,"master_class_sum.rds")
write.csv(master_class_sum,"master_class_sum.csv")
In this section, we project the relevant small-scale features in a Heatmap
###Create heatmap
sum_all<-master_class_sum%>%group_by(cluster2)%>%
summarise(displacement2 = median(disp2),
speed = median(speed),displacement_delta=median(disp_d),displacement_length=median(disp_l),max_speed=quantile(speed, 0.75), persistance=(displacement_length/displacement_delta),nearest_10_cell= median(dist_10_neigh),dist_tumor_core=median(distance_to_tumor),
n_SR101=mean(n_SR101, na.rm = T),
min_SR101=mean(min_SR101, na.rm = T),n_MG=mean(n_MG, na.rm = T),min_MG=mean(min_MG, na.rm = T),min_BV_dist=mean(BV_min, na.rm = T),sd_BV_dist=mean(BV_sd, na.rm = T),mean_BV_dist=mean(BV_mean, na.rm = T),
movement=median(movement) )
Cluster_movement<-sum_all[,c(1,length(names(sum_all)))]
Cluster_movement <-Cluster_movement[order(Cluster_movement$movement),]
stdize = function(x, ...) {(x - min(x, ...)) / (max(x, ...) - min(x, ...))}
sum_all[-c(1)] <- lapply(sum_all[-c(1)], stdize, na.rm = T)
df<-sum_all[,-c(1)]
df<-df[order(match(rownames(df), Cluster_movement$cluster2)), , drop = FALSE]%>%ungroup()
df<-as.data.frame(df)
rownames(df)<-Cluster_movement$cluster2
Heatmap of all the included features:
heat_m<-pheatmap(df, clustering_method = "complete", cluster_cols=F,cluster_rows=F,cellwidth=20, cellheight=20,
treeheight_row = 0, treeheight_col = 0,fontsize = 8,na_col = "grey50", main=" All features",
angle_col = 90,color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
df_other_features<-df[,!colnames(df)%in%c("displacement2", "speed", "max_speed", "displacement_delta" , "displacement_length", "persistance", "movement")]
rownames(df_other_features)<-rownames(df)
Heatmap of the small-scale features:
pheatmap(df_other_features, clustering_method = "complete", cluster_cols=F,cluster_rows=F,cellwidth=20, cellheight=20,
treeheight_row = 0, treeheight_col = 0,fontsize = 8,na_col = "grey50",main="Small-scale TME features",
angle_col = 90,color = colorRampPalette(c("deepskyblue1", "grey95", "deeppink2"))(50))
Plots the behavioral clusters according to relevant small-scale
features
## plot clusters giving them a color according to direction
# Convert df to a data frame
# Add cluster2 as a column
df1<-as.data.frame(df)
df1 <- mutate(df1, cluster2 = row.names(df))
# Order df1 by movement and reorder cluster2 accordingly
df1 <- df1 %>%
arrange(movement) %>%
mutate(cluster2 = factor(cluster2, levels = unique(cluster2)))
##set first order clusters based on the heatmap:
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=n_SR101)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP number of SR101 in 30um radius distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),na.value="NA")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=min_SR101)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP min distance to SR101 in 30um radius distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"),na.value="NA", trans="pseudo_log")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=min_MG)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP min distance to MG in 30um radius distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = RColorBrewer::brewer.pal(4, "Spectral"),na.value="NA", trans="pseudo_log")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=n_MG)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP number of MG in 30um radius distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),na.value="NA")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=BV_mean)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP mean distance to Blood vessels") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),na.value="NA", trans="pseudo_log")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=BV_min)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP mean contact with Blood vessels") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),na.value="NA")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=dist_10_neigh)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP big neighbours distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),trans="pseudo_log")
ggplot(master_class_sum, mapping=aes(x=V1, y=V2, color=dist_3_neigh)) +
geom_point(size=2, alpha=0.6) +
xlab("") + ylab("") +
ggtitle("UMAP close neighbours distribution") +
theme_light(base_size=20) +
theme(axis.text.x=element_blank(),
axis.text.y=element_blank())+theme(aspect.ratio=1)+
scale_color_gradientn(colours = rev(RColorBrewer::brewer.pal(4, "Spectral")),trans="pseudo_log")
This section analyzes the differences between clusters for small-scale features and provides statistical support in each case.
### plot differences of the values per cluster
master_class_sum<-as.data.frame(master_class_sum)
master_class_sum$cluster2<-as.character(master_class_sum$cluster2)
df1$mean_movement<-df1$movement
master_class_sum2<-left_join(master_class_sum, df1[,c("cluster2","mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels=df1$cluster2)
#### try an anova between clusters based on distance to dist_3_neigh
p_dist3_neigth <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=dist_3_neigh, group=cluster2, fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot(alpha=0.5, outlier.colour = NA)+ggtitle("dist 3 neigh per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(10,70))
a_dist_3_neigh <- aov(dist_3_neigh ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based on distance to dist_10_neigh
p_dist_10_neigh <- ggplot(master_class_sum2,aes(x=as.factor(cluster2) , y=dist_10_neigh, group=cluster2,fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("dist 10 neigh per cluster")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(20,90))
a_dist_10_neigh <- aov(dist_10_neigh ~ cluster2, data=master_class_sum2)
#### try an anova between clusters based on dist to MG
p_min_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)) ,aes(x=as.factor(cluster2) , y=min_MG, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("distance to closest MG")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(2,45))
a_min_MG <- aov(min_MG ~ cluster2, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_min_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)) ,aes(x=as.factor(cluster2) , y=min_SR101, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("distance to closest SR101")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,50))
a_min_SR101 <- aov(min_SR101 ~ cluster2, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_n_MG <- ggplot(subset(master_class_sum2, !is.na(n_MG)) ,aes(x=as.factor(cluster2) , y=n_MG, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("n MG")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,15))
a_n_MG <- aov(n_MG ~ cluster2, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_nSR101 <- ggplot(subset(master_class_sum2, !is.na(n_SR101)) ,aes(x=as.factor(cluster2) , y=n_SR101, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("n SR101")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_nSR101 <- aov(n_SR101 ~ cluster2, data= subset(master_class_sum2, !is.na(n_SR101)))
#### try an anova between clusters based on mean distance to BV
p_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)) ,aes(x=as.factor(cluster2) , y=BV_mean, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("BV_mean")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_BV_mean <- aov(BV_mean ~ cluster2, data= subset(master_class_sum2, !is.na(BV_mean)))
#### try an anova between clusters based on min distance to BV
p_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_min, group=cluster2, fill=cluster2)) +geom_jitter(width=0.2, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("BV_min")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,20))
a_BV_min <- aov(BV_min ~ cluster2, data= subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on contact to BV
p_BV_contact <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_contact, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot(outlier.colour = NA, alpha=0.5) +ggtitle("BV_contact")+theme_classic()+ scale_fill_manual(values=mypalette_1)+
theme(aspect.ratio=0.7)
a_BV_contact <- aov(BV_contact ~ cluster2, data= subset(master_class_sum2, !is.na(BV_contact)))
#### try an anova between clusters based on sd distance to BV
p_BV_sd <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(cluster2) , y=BV_sd, group=cluster2, fill=cluster2)) +geom_jitter(width=0.3, alpha=0.5) +
geom_boxplot( outlier.colour = NA, alpha=0.5) +ggtitle("Standart deviation distance")+theme_classic()+
theme(aspect.ratio=0.7)+ scale_fill_manual(values=mypalette_1)+coord_cartesian(ylim = c(0,1.7))
a_BV_sd <- aov(BV_sd ~ cluster2, data= subset(master_class_sum2, !is.na(BV_min)))
Statistical information based on ANOVA and Tukey’s HSD (Honestly
Significant Difference) test:
summary(a_dist_3_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 5583 930.5 4.259 0.000304 ***
## Residuals 974 212825 218.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_3_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_3_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 1-5 4.2270133 -1.07275492 9.526781 0.2185405
## 7-5 3.2882243 -1.32150539 7.897954 0.3489151
## 4-5 5.3326959 0.07378605 10.591606 0.0443659
## 2-5 6.7891687 1.67457635 11.903761 0.0018140
## 6-5 7.3128488 1.55167888 13.074019 0.0035271
## 3-5 7.4835679 1.34586491 13.621271 0.0060985
## 7-1 -0.9387890 -5.65919554 3.781618 0.9971661
## 4-1 1.1056826 -4.25050661 6.461872 0.9965152
## 2-1 2.5621554 -2.65240930 7.776720 0.7732634
## 6-1 3.0858355 -2.76426782 8.935939 0.7089385
## 3-1 3.2565546 -2.96470154 9.477811 0.7164002
## 4-7 2.0444716 -2.63001527 6.718959 0.8556324
## 2-7 3.5009444 -1.01056860 8.012457 0.2484702
## 6-7 4.0246245 -1.20851733 9.257766 0.2586165
## 3-7 4.1953436 -1.44966238 9.840350 0.2986409
## 2-4 1.4564728 -3.71656068 6.629506 0.9816231
## 6-4 1.9801529 -3.83296152 7.793267 0.9525986
## 3-4 2.1508719 -4.03561476 8.337359 0.9477558
## 6-2 0.5236801 -5.15920830 6.206568 0.9999667
## 3-2 0.6943992 -5.36988474 6.758683 0.9998804
## 3-6 0.1707191 -6.44800913 6.789447 1.0000000
summary(a_dist_10_neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 14631 2438.5 5.999 3.57e-06 ***
## Residuals 974 395902 406.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_dist_10_neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ cluster2, data = master_class_sum2)
##
## $cluster2
## diff lwr upr p adj
## 1-5 6.2224350 -1.0059098 13.450780 0.1451596
## 7-5 4.8525599 -1.4346425 11.139762 0.2545632
## 4-5 7.5392685 0.3666503 14.711887 0.0320151
## 2-5 9.3499417 2.3741580 16.325725 0.0015577
## 6-5 12.9036631 5.0460134 20.761313 0.0000293
## 3-5 12.3821039 4.0109014 20.753306 0.0002764
## 7-1 -1.3698752 -7.8080296 5.068279 0.9958819
## 4-1 1.3168335 -5.9884639 8.622131 0.9983664
## 2-1 3.1275067 -3.9846291 10.239643 0.8524115
## 6-1 6.6812281 -1.2977178 14.660174 0.1699298
## 3-1 6.1596689 -2.3254917 14.644829 0.3271879
## 4-7 2.6867086 -3.6888161 9.062233 0.8762512
## 2-7 4.4973819 -1.6558630 10.650627 0.3188377
## 6-7 8.0511033 0.9136301 15.188576 0.0155433
## 3-7 7.5295441 -0.1696699 15.228758 0.0600653
## 2-4 1.8106733 -5.2448182 8.866165 0.9886692
## 6-4 5.3643946 -2.5641021 13.292891 0.4158779
## 3-4 4.8428354 -3.5949032 13.280574 0.6190955
## 6-2 3.5537214 -4.1971604 11.304603 0.8256201
## 3-2 3.0321622 -5.2389043 11.303229 0.9330830
## 3-6 -0.5215592 -9.5488313 8.505713 0.9999979
summary(a_min_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 753 125.4 0.866 0.52
## Residuals 586 84875 144.8
TukeyHSD(a_min_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 1.81622585 -3.697156 7.329607 0.9593052
## 7-5 -0.40915078 -5.436036 4.617735 0.9999838
## 4-5 -0.42169288 -6.675089 5.831703 0.9999947
## 2-5 1.20268356 -4.022386 6.427753 0.9936064
## 6-5 0.08566459 -6.075000 6.246329 1.0000000
## 3-5 -2.34882336 -8.855682 4.158035 0.9371911
## 7-1 -2.22537663 -6.993653 2.542900 0.8120402
## 4-1 -2.23791873 -8.285385 3.809547 0.9295778
## 2-1 -0.61354229 -5.590312 4.363227 0.9998137
## 6-1 -1.73056126 -7.682088 4.220966 0.9781188
## 3-1 -4.16504921 -10.474256 2.144157 0.4459793
## 4-7 -0.01254210 -5.620041 5.594957 1.0000000
## 2-7 1.61183434 -2.819919 6.043587 0.9349676
## 6-7 0.49481537 -5.009081 5.998712 0.9999708
## 3-7 -1.93967258 -7.828500 3.949155 0.9593289
## 2-4 1.62437644 -4.161452 7.410205 0.9817153
## 6-4 0.50735747 -6.135553 7.150268 0.9999889
## 3-4 -1.92713048 -8.892306 5.038045 0.9830422
## 6-2 -1.11701897 -6.802495 4.568458 0.9973229
## 3-2 -3.55150692 -9.610390 2.507376 0.5932220
## 3-6 -2.43448796 -9.316530 4.447554 0.9428443
summary(a_min_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 1673 278.9 1.736 0.111
## Residuals 381 61201 160.6
TukeyHSD(a_min_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 0.68094708 -6.8911649 8.253059 0.9999703
## 7-5 3.48835450 -2.4912158 9.467925 0.5967997
## 4-5 6.07625307 -0.2418227 12.394329 0.0683548
## 2-5 3.41671437 -4.5559583 11.389387 0.8651355
## 6-5 1.73466735 -5.9590928 9.428427 0.9942036
## 3-5 4.20416185 -4.1083764 12.516700 0.7452901
## 7-1 2.80740742 -4.1916532 9.806468 0.8981057
## 4-1 5.39530599 -1.8950760 12.685688 0.3013475
## 2-1 2.73576730 -6.0275139 11.499048 0.9683490
## 6-1 1.05372027 -7.4565989 9.564039 0.9998059
## 3-1 3.52321477 -5.5503666 12.596796 0.9116317
## 4-7 2.58789857 -3.0306466 8.206444 0.8199196
## 2-7 -0.07164013 -7.5022174 7.358937 1.0000000
## 6-7 -1.75368715 -8.8841790 5.376805 0.9907428
## 3-7 0.71580735 -7.0783097 8.509924 0.9999664
## 2-4 -2.65953869 -10.3651404 5.046063 0.9485273
## 6-4 -4.34158572 -11.7582382 3.075067 0.5927856
## 3-4 -1.87209122 -9.9288325 6.184650 0.9931752
## 6-2 -1.68204702 -10.5506525 7.186558 0.9977684
## 3-2 0.78744748 -8.6229993 10.197894 0.9999806
## 3-6 2.46949450 -6.7058499 11.644839 0.9851112
summary(a_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 76 12.73 1.262 0.273
## Residuals 586 5911 10.09
TukeyHSD(a_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ cluster2, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 -0.59040959 -2.0454545 0.8646354 0.8939285
## 7-5 0.48178211 -0.8448709 1.8084351 0.9354167
## 4-5 0.09090909 -1.5594341 1.7412523 0.9999984
## 2-5 0.17515818 -1.2037978 1.5541141 0.9997783
## 6-5 -0.32071319 -1.9465835 1.3051572 0.9972626
## 3-5 0.25417440 -1.4630605 1.9714093 0.9994628
## 7-1 1.07219170 -0.1862115 2.3305949 0.1536807
## 4-1 0.68131868 -0.9146773 2.2773146 0.8684859
## 2-1 0.76556777 -0.5478590 2.0789946 0.5998032
## 6-1 0.26969641 -1.3009801 1.8403729 0.9987449
## 3-1 0.84458399 -0.8204884 2.5096563 0.7444379
## 4-7 -0.39087302 -1.8707566 1.0890106 0.9866844
## 2-7 -0.30662393 -1.4762146 0.8629668 0.9871948
## 6-7 -0.80249529 -2.2550369 0.6500464 0.6598973
## 3-7 -0.22760771 -1.7817372 1.3265218 0.9994946
## 2-4 0.08424908 -1.4426978 1.6111960 0.9999984
## 6-4 -0.41162228 -2.1647628 1.3415183 0.9928820
## 3-4 0.16326531 -1.6749248 2.0014554 0.9999729
## 6-2 -0.49587136 -1.9963341 1.0045914 0.9586684
## 3-2 0.07901622 -1.5199928 1.6780252 0.9999992
## 3-6 0.57488758 -1.2413626 2.3911377 0.9665241
summary(a_nSR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 294 49.0 3.378 0.00294 **
## Residuals 381 5526 14.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_nSR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ cluster2, data = subset(master_class_sum2, !is.na(n_SR101)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 -0.2250000 -2.500401 2.05040144 0.9999480
## 7-5 -1.5958738 -3.392720 0.20097271 0.1191744
## 4-5 -2.4351266 -4.333693 -0.53656002 0.0031509
## 2-5 -2.0661765 -4.461945 0.32959249 0.1426613
## 6-5 -1.8881579 -4.200114 0.42379852 0.1926995
## 3-5 -1.2583333 -3.756231 1.23956440 0.7488373
## 7-1 -1.3708738 -3.474075 0.73232708 0.4603374
## 4-1 -2.2101266 -4.400869 -0.01938435 0.0464098
## 2-1 -1.8411765 -4.474521 0.79216845 0.3712472
## 6-1 -1.6631579 -4.220488 0.89417255 0.4631756
## 3-1 -1.0333333 -3.759923 1.69325604 0.9207076
## 4-7 -0.8392528 -2.527612 0.84910652 0.7605062
## 2-7 -0.4703027 -2.703173 1.76256791 0.9960059
## 6-7 -0.2922841 -2.434980 1.85041154 0.9996594
## 3-7 0.3375405 -2.004573 2.67965387 0.9995320
## 2-4 0.3689501 -1.946565 2.68446490 0.9991673
## 6-4 0.5469687 -1.681718 2.77565491 0.9908467
## 3-4 1.1767932 -1.244238 3.59782462 0.7792460
## 6-2 0.1780186 -2.486976 2.84301317 0.9999949
## 3-2 0.8078431 -2.019973 3.63565976 0.9797490
## 3-6 0.6298246 -2.127344 3.38699347 0.9937772
summary(a_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 132 22.02 0.627 0.708
## Residuals 625 21940 35.10
TukeyHSD(a_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ cluster2, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 0.08803219 -2.521247 2.697311 0.9999999
## 7-5 0.16105899 -1.990331 2.312449 0.9999902
## 4-5 0.69722163 -1.682814 3.077257 0.9773097
## 2-5 -0.33683644 -3.130061 2.456388 0.9998365
## 6-5 0.60332716 -2.161136 3.367791 0.9952234
## 3-5 -1.04138366 -4.026316 1.943548 0.9465171
## 7-1 0.07302680 -2.356534 2.502587 1.0000000
## 4-1 0.60918944 -2.024982 3.243360 0.9934517
## 2-1 -0.42486863 -3.437573 2.587836 0.9995942
## 6-1 0.51529497 -2.470764 3.501354 0.9987112
## 3-1 -1.12941585 -4.320671 2.061839 0.9427934
## 4-7 0.53616264 -1.645350 2.717675 0.9909170
## 2-7 -0.49789542 -3.124020 2.128229 0.9978098
## 6-7 0.44226817 -2.153245 3.037781 0.9988004
## 3-7 -1.20244265 -4.031621 1.626735 0.8710146
## 2-4 -1.03405806 -3.850549 1.782433 0.9321688
## 6-4 -0.09389447 -2.881865 2.694076 0.9999999
## 3-4 -1.73860529 -4.745321 1.268110 0.6093002
## 6-2 0.94016360 -2.207901 4.088228 0.9749851
## 3-2 -0.70454723 -4.047880 2.638785 0.9960593
## 3-6 -1.64471083 -4.964052 1.674631 0.7651481
summary(a_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 507 84.47 3.002 0.00672 **
## Residuals 625 17587 28.14
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 -0.4451385 -2.7812665 1.8909895 0.9977475
## 7-5 0.7207875 -1.2053851 2.6469600 0.9259982
## 4-5 0.9452020 -1.1856808 3.0760848 0.8463028
## 2-5 -1.1545119 -3.6553290 1.3463052 0.8199566
## 6-5 0.5472337 -1.9278334 3.0223008 0.9948716
## 3-5 -2.0710799 -4.7435359 0.6013761 0.2492610
## 7-1 1.1659260 -1.0092975 3.3411494 0.6917646
## 4-1 1.3903405 -0.9680737 3.7487547 0.5867217
## 2-1 -0.7093734 -3.4066947 1.9879480 0.9870013
## 6-1 0.9923722 -1.6810925 3.6658368 0.9286743
## 3-1 -1.6259414 -4.4831213 1.2312386 0.6275307
## 4-7 0.2244145 -1.7287276 2.1775567 0.9998768
## 2-7 -1.8752993 -4.2265095 0.4759108 0.2179366
## 6-7 -0.1735538 -2.4973567 2.1502491 0.9999903
## 3-7 -2.7918673 -5.3248742 -0.2588604 0.0200115
## 2-4 -2.0997139 -4.6213621 0.4219344 0.1747653
## 6-4 -0.3979683 -2.8940814 2.0981448 0.9991787
## 3-4 -3.0162819 -5.7082411 -0.3243226 0.0167952
## 6-2 1.7017455 -1.1167657 4.5202568 0.5580847
## 3-2 -0.9165680 -3.9099055 2.0767695 0.9716436
## 3-6 -2.6183135 -5.5901716 0.3535445 0.1258270
summary(a_BV_contact)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 0.72 0.1204 0.741 0.617
## Residuals 625 101.49 0.1624
TukeyHSD(a_BV_contact)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_contact ~ cluster2, data = subset(master_class_sum2, !is.na(BV_contact)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 0.0022505559 -0.1752174 0.17971853 1.0000000
## 7-5 0.0089169422 -0.1374081 0.15524195 0.9999971
## 4-5 0.0378651888 -0.1240110 0.19974135 0.9930357
## 2-5 0.0369412193 -0.1530376 0.22692008 0.9974772
## 6-5 0.0746015532 -0.1134212 0.26262427 0.9038935
## 3-5 -0.0710754471 -0.2740932 0.13194226 0.9456334
## 7-1 0.0066663863 -0.1585782 0.17191096 0.9999998
## 4-1 0.0356146329 -0.1435463 0.21477561 0.9971473
## 2-1 0.0346906634 -0.1702160 0.23959730 0.9988436
## 6-1 0.0723509973 -0.1307433 0.27544533 0.9410075
## 3-1 -0.0733260029 -0.2903766 0.14372457 0.9541101
## 4-7 0.0289482466 -0.1194255 0.17732204 0.9974294
## 2-7 0.0280242771 -0.1505894 0.20663799 0.9992506
## 6-7 0.0656846110 -0.1108471 0.24221629 0.9278901
## 3-7 -0.0799923893 -0.2724166 0.11243182 0.8824856
## 2-4 -0.0009239695 -0.1924853 0.19063736 1.0000000
## 6-4 0.0367363644 -0.1528851 0.22635788 0.9975284
## 3-4 -0.1089406359 -0.3134399 0.09555867 0.6979671
## 6-2 0.0376603339 -0.1764527 0.25177337 0.9985642
## 3-2 -0.1080166664 -0.3354107 0.11937735 0.7991213
## 3-6 -0.1456770002 -0.3714393 0.08008529 0.4752505
summary(a_BV_sd)
## Df Sum Sq Mean Sq F value Pr(>F)
## cluster2 6 6.95 1.1586 10.58 3.32e-11 ***
## Residuals 625 68.44 0.1095
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_BV_sd)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_sd ~ cluster2, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $cluster2
## diff lwr upr p adj
## 1-5 0.096038865 -0.04969802 0.241775748 0.4484503
## 7-5 -0.105339661 -0.22550191 0.014822585 0.1297847
## 4-5 -0.055998998 -0.18893188 0.076933879 0.8757207
## 2-5 0.168771120 0.01276028 0.324781960 0.0242357
## 6-5 -0.005001466 -0.15940592 0.149402986 0.9999999
## 3-5 0.218443315 0.05172496 0.385161666 0.0022490
## 7-1 -0.201378526 -0.33707755 -0.065679506 0.0002677
## 4-1 -0.152037863 -0.29916505 -0.004910678 0.0375260
## 2-1 0.072732255 -0.09553729 0.241001803 0.8616860
## 6-1 -0.101040331 -0.26782161 0.065740943 0.5539608
## 3-1 0.122404450 -0.05583771 0.300646609 0.3958309
## 4-7 0.049340663 -0.07250405 0.171185379 0.8949695
## 2-7 0.274110781 0.12743301 0.420788548 0.0000010
## 6-7 0.100338195 -0.04462980 0.245306191 0.3858679
## 3-7 0.323782976 0.16576401 0.481801942 0.0000000
## 2-4 0.224770118 0.06745975 0.382080485 0.0005414
## 6-4 0.050997532 -0.10471985 0.206714918 0.9604983
## 3-4 0.274442313 0.10650727 0.442377354 0.0000346
## 6-2 -0.173772586 -0.34960244 0.002057266 0.0551349
## 3-2 0.049672195 -0.13706401 0.236408401 0.9862026
## 3-6 0.223444781 0.03804855 0.408841014 0.0071394
The output in this section shows a boxplot for each small-scale feature
in each behavioral cluster
p_dist3_neigth
p_dist_10_neigh
p_min_MG
p_min_SR101
p_n_MG
p_nSR101
p_BV_mean
p_BV_min
p_BV_contact
p_BV_sd
This section analyzes the differences between large-scale regions for small-scale features and provides statistical support in each case.
####Differences based on environmental clusters
### plot differences of the values per cluster
master_class_sum<-as.data.frame(master_class_sum)
master_class_sum$cluster2<-as.character(master_class_sum$cluster2)
df1$mean_movement<-df1$movement
master_class_sum2<-left_join(master_class_sum, df1[,c("cluster2","mean_movement")])
master_class_sum2$cluster2 = factor(master_class_sum2$cluster2, levels=df1$cluster2)
#### try an anova between clusters based on distance to dist_10_neigh
p_class_dist_10neigh <- ggplot(master_class_sum2,aes(x=as.factor(class) , y=dist_10_neigh, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("dist 10 neigh per cluster")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(20,70))
a_class_dist_10neigh <- aov(dist_10_neigh ~ class, data=master_class_sum2)
#### try an anova between clusters based on dist to MG
p_class_minMG <- ggplot(subset(master_class_sum2, !is.na(min_MG)) ,aes(x=as.factor(class) , y=min_MG, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("distance to closest MG")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,50))
a_class_minMG <- aov(min_MG ~ class, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on dist to SR101
p_classmin_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)) ,aes(x=as.factor(class) , y=min_SR101, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA) +stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("distance to closest SR101")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,50))
a_classmin_SR101 <- aov(min_SR101 ~ class, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on n of MG
p_class_n_MG <- ggplot(subset(master_class_sum2, !is.na(min_MG)),aes(x=as.factor(class) , y=n_MG, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA) +stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("n of MG per cluster")+theme_classic()+
theme(aspect.ratio=1)
a_class_n_MG <- aov(n_MG ~ class, data= subset(master_class_sum2, !is.na(min_MG)))
#### try an anova between clusters based on n of SR101
p_classn_SR101 <- ggplot(subset(master_class_sum2, !is.na(min_SR101)),aes(x=as.factor(class) , y=n_SR101, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+scale_fill_manual(values=c("cyan","gold","red"))+ggtitle("n SR101 per cluster")+theme_classic()+
theme(aspect.ratio=1)
a_classn_SR101 <- aov(n_SR101 ~ class, data= subset(master_class_sum2, !is.na(min_SR101)))
#### try an anova between clusters based on BV distance min
p_class_BV_min <- ggplot(subset(master_class_sum2, !is.na(BV_min)) ,aes(x=as.factor(class) , y=BV_min, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+ scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("min distance to BV")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,18))
a_class_BV_min <- aov(BV_min ~ class, data= subset(master_class_sum2, !is.na(BV_min)))
#### try an anova between clusters based on BV distance mean
p_class_BV_mean <- ggplot(subset(master_class_sum2, !is.na(BV_mean)) ,aes(x=as.factor(class) , y=BV_mean, fill=class)) +
geom_violin(alpha=1, outlier.colour = NA)+stat_summary(fun = median, geom = "crossbar", size = 1, color = "grey25")+ scale_fill_manual(values=c("cyan","gold","red")) +ggtitle("mean distance to BV")+theme_classic()+
theme(aspect.ratio=1)+coord_cartesian(ylim = c(0,18))
a_class_BV_mean <- aov(BV_mean ~ class, data= subset(master_class_sum2, !is.na(BV_mean)))
##relation between position relative to the tumor and amount of MG:
pcor_MG_position <- ggplot(subset(master_class_sum2, !is.na(min_MG)),aes(x=distance_to_tumor , y=n_MG)) +geom_jitter()+
geom_point(alpha=0.5,stat = "summary") +geom_smooth(method = "lm")+ggtitle("scatter plot movement vs min_MG")+theme_classic()+
theme(aspect.ratio=1)
cor_18<-cor.test(master_class_sum2$n_MG,master_class_sum2$distance_to_tumor, method = "pearson", use = "pairwise.complete.obs")
Statistical information between large-scale regions (environmental
clusters) based on ANOVA and Tukey’s HSD (Honestly Significant
Difference) test:
summary(a_class_dist_10neigh)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 1320 659.9 1.577 0.207
## Residuals 978 409213 418.4
TukeyHSD(a_class_dist_10neigh)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = dist_10_neigh ~ class, data = master_class_sum2)
##
## $class
## diff lwr upr p adj
## CL2-CL1 -2.5151366 -6.376437 1.346164 0.2777655
## CL3-CL1 -0.1387336 -3.950374 3.672907 0.9959840
## CL3-CL2 2.3764030 -1.265054 6.017860 0.2764354
summary(a_class_minMG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 2813 1406.4 10.02 5.26e-05 ***
## Residuals 590 82815 140.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_minMG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -5.075902 -7.941646 -2.210159 0.0001074
## CL3-CL1 -3.773329 -6.454107 -1.092551 0.0028689
## CL3-CL2 1.302574 -1.640317 4.245465 0.5519149
summary(a_classmin_SR101 )
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 3438 1719.0 11.13 1.99e-05 ***
## Residuals 385 59436 154.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classmin_SR101 )
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = min_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -9.024564 -13.534491 -4.514638 0.0000104
## CL3-CL1 -7.309367 -11.897598 -2.721135 0.0006007
## CL3-CL2 1.715198 -1.496459 4.926854 0.4207669
summary(a_class_n_MG)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 644 322.0 35.55 2.64e-15 ***
## Residuals 590 5344 9.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_n_MG)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_MG ~ class, data = subset(master_class_sum2, !is.na(min_MG)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.592120 1.8641533 3.3200858 0.0000000
## CL3-CL1 1.337224 0.6562435 2.0182050 0.0000145
## CL3-CL2 -1.254895 -2.0024589 -0.5073317 0.0002641
summary(a_classn_SR101)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 422 210.98 15.05 5.11e-07 ***
## Residuals 385 5398 14.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_classn_SR101)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = n_SR101 ~ class, data = subset(master_class_sum2, !is.na(min_SR101)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 2.6028601 1.2436788 3.962041 0.0000261
## CL3-CL1 3.2029326 1.8201521 4.585713 0.0000003
## CL3-CL2 0.6000725 -0.3678421 1.567987 0.3121099
summary(a_class_BV_min)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 591 295.71 10.63 2.89e-05 ***
## Residuals 629 17502 27.82
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_min)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_min ~ class, data = subset(master_class_sum2, !is.na(BV_min)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.059503 -2.238974 0.1199684 0.0885449
## CL3-CL1 -2.422248 -3.657506 -1.1869908 0.0000148
## CL3-CL2 -1.362746 -2.580036 -0.1454545 0.0237365
summary(a_class_BV_mean)
## Df Sum Sq Mean Sq F value Pr(>F)
## class 2 463 231.36 6.735 0.00128 **
## Residuals 629 21609 34.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(a_class_BV_mean)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = BV_mean ~ class, data = subset(master_class_sum2, !is.na(BV_mean)))
##
## $class
## diff lwr upr p adj
## CL2-CL1 -1.3233714 -2.633946 -0.01279692 0.0471867
## CL3-CL1 -2.1065180 -3.479080 -0.73395634 0.0009792
## CL3-CL2 -0.7831466 -2.135745 0.56945151 0.3625740
Statistical information based on correlation test between position
relative to the tumor and amount of MG:
cor_18
##
## Pearson's product-moment correlation
##
## data: master_class_sum2$n_MG and master_class_sum2$distance_to_tumor
## t = -0.14404, df = 591, p-value = 0.8855
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.08639938 0.07462654
## sample estimates:
## cor
## -0.005924827
The output in this section shows a boxplot for each feature in each behavioral cluster according to large-scale regions:
p_class_dist_10neigh
p_class_minMG
p_classmin_SR101
p_class_n_MG
p_classn_SR101
p_class_BV_min
p_class_BV_mean
Thank you for reading the BEHAV3D Tumor Profiler - Guided Tutorial.